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Sequenced from the start 


Four US studies are set to explore how genomic data can best help healthy and illnewborns. 


They must also settle some questions of ethics. 


diagnosis, but it is not yet clear how useful it will be for disease 

prevention or health management. A US$25-million project 
announced last week aims to explore that issue in perhaps the most 
high-stakes patient group: newborn babies. 

In the Genomic Sequencing and Newborn Screening Disorders 
(GSNSD) programme, four teams will sequence the exomes — the 
protein-coding portions of the genome — or the whole genomes of 
more than 1,500 babies, including not only infants who are ill, whether 
or not the disease has been diagnosed, but also healthy babies. The 
programme is funded by the US National Human Genome Research 
Institute and the Eunice Shriver Kennedy National Institute of Child 
Health and Human Development (NICHD). The studies will examine 
how useful sequencing information is for families and doctors, and 
whether it is superior to data gathered through conventional newborn- 
screening methods, which check for about 60 genetic disorders. 

The project joins a short but growing list of studies testing the utility 
of clinical sequencing in otherwise healthy individuals, but it is the 
first to focus on healthy and ill babies. As such, it will highlight five 
hot questions. 

First, do we yet know enough about how genes code for health to 
make genomic data useful in preventing disease? Studies have found 
that sequencing can diagnose 15-50% of children with otherwise 
undiagnosable illnesses, but no one has yet asked what use it has for 
healthy children. Not all genetic traits will influence a person’s health, 
and it is still not possible to say with any certainty what a given genetic 
variant will mean for a given individual. 

Second, what kind of genetic findings should doctors return to 
patients, and does the answer differ between children and adults, or 
between ill and healthy people? The family that is unsure whether its ill 
baby will live or die is not in a good position to absorb information about 
a hypothetical future cancer risk. The family that has a baby’s genome 
sequenced just to see what might be found may spend years worrying 
about that cancer risk in their perfectly healthy child. The key will be 
to find the children who will best benefit from this knowledge, because 
their individual disease risks are real enough that routine screening 
could save their lives. In the US health-care system, which is prone to 
over-diagnosis and over-treatment of cancer, for example, this is a tricky 
balance. Some of the GSNSD projects will check babies’ genomes for 
genes not linked to any immediate illness, although each study is taking 
a different approach to how it will inform parents about risks. 

Third, what is the quickest and cheapest way to conduct clinical 
sequencing so that it returns accurate information to patients in time 
to influence care decisions? Increasing the number of conditions being 
screened for will necessarily cause more false positives — as many 
as 20 for every true positive, according to one estimate. Those false 
positives will lead to increased medical costs and anxiety for families. 

Fourth, who owns the genetic data? None of the GSNSD studies 


(Si sequencing has established itself as a powerful tool for 


plans to give the raw genetic data to the children’s families, even though 
that could allow the children to benefit from it throughout their lives. 
Finally, should the data be shared with other researchers? This would 
be the best way for scientists to help tackle the tough question of how 
genes contribute to disease. Butit is increasingly difficult to guarantee 
the privacy of genetic data (see Nature 493, 451; 2013), and this is an 
important issue for babies, whose informa- 


“The day when tion will be known for their entire lives even 
all children will though they themselves have not consented 
be sequenced to the disclosure. One of the GSNSD projects 


will share data with the NICHD’s Newborn 
Screening Translational Research Network, 
and another with the National Center for Bio- 
technology Information’s Database of Geno- 
types and Phenotypes. The other two are still deciding. 

As researchers explore these questions, sequencing costs continue 
to drop and the day when all children will be sequenced at birth — if 
not before — draws ever nearer. Some people are wary of this, and are 
already warning of the dangers of what they consider to be a govern- 
ment-funded plan to store all citizens’ data. If newborn sequencing is 
to fulfil its potential to save many children’s lives, it is imperative that 
scientists get the ethics and the science right. m 


at birth — if not 
before — draws 
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Under threat 


The grey wolf is at risk of losing its endangered 
status under US law. 


decide whether wolves were illegal immigrants. Ranching groups 

that were against the proposed reintroduction of the wild animals 
to Idaho, Wyoming and Montana were trying to block their transport 
across the border from Canada. The appeal failed and the foreign 
wolves were delivered and released. 

In recent decades, many more in the legal profession have become 
familiar with Canis lupus. The grey wolf, and its place in the US land- 
scape, sharply divides opinion — both scientific and political. Broadly 
speaking, conservationists want the wolf population to expand into its 
historical range, whereas the ranching community is anxious about 
large numbers of a top predator roaming free. Both sides can point 
to scientific research and ecological opinion to support their stance. 

The battle over the fate of the grey wolf is gearing up for a new con- 
flict, perhaps the most significant yet. As we report on page 143, the 
US Fish and Wildlife Service (FWS) has extended the period for which 
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it will allow public comment on its controversial proposal to strip the 
grey wolf of its federal protection. The agency wants to remove the 
animal from the list of those protected under the 1973 Endangered 
Species Act, and to hand responsibility for its management and con- 
servation to individual states. The FWS claims that the move follows 
the “successful recovery” of the wolf in two key regions. Scratch the 
surface, though, and it looks more like a cost-cutting exercise and, to 
some, a politically convenient one. Gary Frazer, assistant director for 
endangered species at the FWS, told Nature that he expects hundreds 
of thousands of people to comment. 

Many of these will be rightly suspicious of the true motives. The 
proposed delisting of the grey wolf comes barely two years after the 
notorious ‘wolf rider’ that saw a clause to remove legal protection of 
the animals in Montana and Idaho tagged by local politicians to an 
unrelated, and essential, budget appropriations bill. The move, this 
journal noted at the time, set a “dangerous precedent” (see Nature 475, 
5; 2011), and was the first time that Congress had removed a species 
from the list. The clumsy political manoeuvre came after a decade of 
court-rebuffed attempts to change the status of the wolf through the 
proper regulatory channels. 

Grey wolves are certainly doing better in the United States than a cen- 
tury or so ago, when rewards for their killing made them locally extinct. 
Controlled reintroduction under the 1973 act has led to populations in 
the thousands around the Great Lakes and Northern Rockies — and to 
the loss of livestock. According to the US Department of Agriculture, 
between 1995 and 2007, wolves killed 298 cattle, 46 sheep, 13 llamas, 24 
goats and 7 horses in Montana. Enough is enough, critics say; the grey 
wolf is no longer endangered. Yet the 1973 act is clear: such a judgement 
must be made over all, or a significant proportion, of the animal's range. 

“We still haven't figured out how to handle a situation where experts 
have outspoken views,’ Frazer says of the divergent opinions on the 


topic. It is a lament that will strike a chord with many policy-makers, 
not least those in Britain, where government-sanctioned marksmen are 
busy reducing the population of another emblematic — and previously 
protected — species. After years of similar arguments and conflicting 
scientific advice, the environment department DEFRA has embarked 
on two pilot culls of badgers, which farmers blame for the spread of 

bovine tuberculosis (TB). 
Ian Boyd, science adviser to DEFRA, writes on page 159 that, too 
often, the evidence used to set policy is biased and unreliable, even 
when published in scientific journals. He 


“Enough is wants to introduce a Kitemark (indicator 
enough, critics of quality approval) for studies that meet an 
say; the grey audited standard of scientific evidence. But 
wolfisnolonger he should be careful what he wishes for. As 


Nature has repeatedly pointed out, the pub- 
lished evidence on badger culls does not indi- 
cate that bovine TB will be reduced by DEFRAs strategy, which relies 
on untested tactics such as free shooting. 

Politics can trump science, of course, for politicians are elected 
to make decisions. But so can sentiment. A few miles along the M4 
motorway from where the badger culls and the protests against them 
are under way is the base of the Wolf Conservation Trust. Wolves 
vanished from Britain centuries ago, but they retain mystique and 
appeal — even to Brits. US lawmakers must bear this in mind as they 
invoke science to argue for the delisting of the grey wolf. 

The protection of the 1973 Endangered Species Act for vulnerable 
animals does not end at the US border. Several overseas and foreign 
species are listed too, and US citizens are forbidden from, for instance, 
trading in them. But a US law that gives sanctuary to the Chinese 
alligator and the great Indian bustard but not to the native grey wolf 
would be a strange beast indeed. = 


endangered.” 


Reality at risk 


Don’t treat amemoir as anything other than 
one person’s interpretation of events. 


with my catamite when Ali announced that the archbishop had 

come to see me.” So runs the first sentence of Earthly Powers, 
Anthony Burgess’s memoir of the fictional novelist Kenneth Toomey. 
“T have lost none of my old cunning in the contrivance of what is 
known as an arresting opening,’ he writes a few lines later — noting 
that, whereas every supposed fact in the first sentence is true, the con- 
text is one of pure artifice, designed to portray an image of the writer 
as he would wish to be seen, not necessarily as he really is. Toomey is 
clearly a writer of some skill (as is Burgess, his inventor and rappor- 
teur), so we, the poor readers, are at his mercy. We can do no other 
than take what he claims as truth at face value, whether it is true or not. 
Such is the caveat emptor of the memoir in general. 

This week, Nature publishes book reviews of two memoirs by very 
real people. On page 162, Robert Crease reviews My Brief History by 
theoretical physicist Stephen Hawking, and notes that it is more akin 
to a PR exercise than a warts-and-all confession: “It does not take the 
reader behind any scenes... It isa concise, gleaming portrait, not unlike 
those issued by the public relations department of an institution.” 
Eugenie Scott on page 163, by contrast, finds An Appetite for Wonder, 
the first volume of memoirs by evolutionary biologist Richard Dawkins, 
“a very honest book’, in which we get a taste of the upbringing and early 
experiences of the author of The Selfish Gene and The God Delusion. 
From which we are entitled to get a flavour, at least, of how Dawkins 
got to be the man he is today — in other words, what makes him tick. 


cc I: was the afternoon of my eighty-first birthday and I was in bed 
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But do we? Geta flavour of what makes someone tick from their own, 
self-selected, self-redacted reminiscence? Memoirs are more vehicles of 
entertainment than any reflection of reality. When one reads The Double 
Helix, James Watson's knockabout account of the discovery of the struc- 
ture of DNA, one should take any facts presented therein strictly as hav- 
ing been heavily filtered by the unashamedly biased reminiscence of just 
one of the protagonists, not as a scholarly account. And there's nothing 
wrong with that. The Double Helix works beautifully as entertainment. 

There is another layer of selection. Those memoirs that get published 
as books are not so much about scientists (say) as celebrities. Readers 
of My Brief History will want to know about Hawking’s triumph over 
his disability more than how he came to this or that conclusion about 
black holes. More people are likely to have encountered Dawkins as 
the doctrinaire neo-atheist of The God Delusion than as the peerless 
commentator on the machinery of evolution in The Selfish Gene — and 
vastly more than as the author of scientific papers on animal behaviour. 

To understand what working scientists free from the constraints of 
celebrity actually do all day, one might turn to the blogosphere and 
follow (to mention just two of thousands) Jenny Rohn’s “Mind the 
gap’ (http://occamstypewriter.org/mindthegap) and, perhaps more 
pertinently, the notes of the anonymously eponymous Female Science 
Professor (http://science-professor.blogspot.co.uk). 

Even then, such writings demonstrate the self-selection of those 
scientists (a tiny proportion) who feel that they have something 
to say. For everyone else, life is something that is lived undocu- 
mented, unshared and in real time. Perhaps the only really ‘true’ 
experiences are those that one has lived oneself. To which one can 
only ask whether one is talking to oneself, or 
whether the Universe has gone solipsistic all of a 
sudden. Then again, to quote that koan from 
Jewish Buddhist wisdom — if there is no self, 
whose arthritis is this? = 
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complete budget appropriations for the fiscal year 2014 with no 

realistic prospect of success, drastic cuts to public spending on 
science seem likely to continue. The effect has been severe: many grant- 
renewal applications at the US National Institutes of Health (NIH), 
which were easily fundable a few years ago, are not even close to being 
funded this year. Things are tough, as reflected by an outpouring of com- 
ment and opinion in the media about the potential decline of US science. 

We have been here before. Around 1970, US spending on science 
was cut to help pay for the Vietnam War. Faculty positions were scarce 
and winning a research grant became much more difficult. But the 
research community survived and thrived. I believe that we will sur- 
vive the current cuts, but because of their depth and arbitrary nature, 
their effects will take years to overcome. 

We should take this opportunity to examine 
and debate the NIH funding structure for bio- 
medical research, and focus on how it could be 
made stronger, more resilient and more diverse 
in the future. To begin, here are four suggestions. 

First, we should revisit the relationship 
between how NIH grants are assessed an 
funded. Grant applications are evaluated on a 
relative scale, but are funded in absolute terms: 
all or nothing. This is illogical. Applications 
that fall on either side of the funding cut-off, 
or payline, are more or less of equal quality. As 
overall success rates decline, this practice becomes more difficult to 
justify. A better approach would be to link funding levels with the per- 
centile scores used to rank applications. NIH institutes should agree 
on the total number of grants to be funded, then give full funding to 
applications with the best scores and partial funding to those with 
slightly lower scores. 

If the total available funds declined, then the curve of percentage 
funding versus percentile score would become steeper — perhaps 
down to 50% of the approved budget for grants with percentile scores 
just within the payline. If funds increased, then the curve would flat- 
ten out. In times of severe budget constraint, this would allow more 
laboratories to stay open — albeit on a smaller scale — preserving 
research infrastructure and lab continuity. Many will argue that this 
would produce labs that are unable to achieve their proposed specific 
aims, but investigators could always drop an aim. 

Second, we should look closer at research productivity and the size 
of grants. Some data on the scientific output of researchers funded 
by the US National Institute of General Medical Sciences (NIGMS) 
suggest that productivity (measured by number 
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It is time to update 
US biomedical funding 


The effects of federal budget cuts provide an opportunity to revisit the funding 
structure of the National Institutes of Health, says Frederick Grinnell. 


dollars, however, smaller labs were more productive per dollar. 

The NIGMS findings are preliminary. Moreover, some types of 
research are much more expensive than others. However, if NIH 
scientific-review groups evaluating grant-renewal applications paid 
more attention to normalized rather than total productivity, then prin- 
cipal investigators with smaller labs might become more competitive 
for funding, resulting in a wider distribution of research dollars. 

At present, when a scientist wins an NIH grant, a percentage of the 
funding is often used for his or her salary. In my view, that percentage 
is too high when faculty members are asked by their institutions to 
raise most or all of their own salaries through grant awards. Doing 
so creates both potential conflicts of commitment and challenges to 
research integrity. Faculty members with NIH funding bring benefits 
beyond the successful completion of research 
projects. They boost student education and 
the development of intellectual property and 
technology transfer. My third suggestion is that 
universities should chip in more to pay these 
academics, freeing up federal money for the 
research itself. 

My final suggestion is to consider the 
broader impact of research grants. As pres- 
sure on public funds intensifies, scientists 
are increasingly being asked to articulate the 
economic benefits of their discoveries. Yet the 
economic impact of biomedical research goes 
beyond its ability to improve human health. It includes education of 
the scientific workforce, expansion of institutional and community 
resources, and development of regional technology centres. 

Funding for academic research and technology development tends 
to be concentrated in a relatively small number of institutions. If ‘scien- 
tific merit’ remains the singular basis for decision-making on grants, 
then historically underfunded institutions are at a disadvantage — 
and the wider economic benefits cannot be shared. Yet incremen- 
tal increases in funding would have a bigger relative impact for such 
institutions than for those already receiving the most research dollars. 
Perhaps grant applications from underfunded institutions should be 
funded at a different payline. That would be controversial but there 
is precedent: the NIH has already modified the payline to increase 
funding for new investigators, another underfunded group. 

These are important points to debate, and some will be unpopular. 
The last consultation effort with the broad US biomedical community 
was in 1992, when I and many others served on the NIH Strategic 
Plan Task Force. That failed because the process was too top-down. 
We should try again. = 


Frederick Grinnell is in the Department of Cell Biology at the 
University of Texas Southwestern Medical Center in Dallas. 
e-mail: frederick.grinnell@utsouthwestern.edu 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


CHEMISTRY 


Power from 
deep-sea vents 


Researchers have harnessed 
deep-sea hydrothermal vents 
to produce electricity. 

Masahiro Yamamoto 
of the Japan Agency for 
Marine-Earth Science and 
Technology in Yokosuka 
and Ryuhei Nakamura 
of the RIKEN Center for 
Sustainable Resource Science 
in Wako and their team 
exploited the differences in 
chemistry between sea water 
and fluids that leak from 
hydrothermal vents. Steep 
concentration gradients of 
chemicals such as hydrogen 
sulphide allowed the 
researchers to generate more 
than 21 milliwatts of power 
from a fuel cell based ona 
platinum cathode and an 
iridium anode. 

This successfully powered 
three light-emitting diodes 
1,000 metres below the 
surface, both at an artificial 
vent created by deep-sea 
drilling and a natural vent. 
Angew. Chem. http://doi.org/ 
f2dtrm (2013) 


GEOLOGY 


Meet the world’s 
largest volcano 


A submarine mountain in the 
northwest Pacific could be the 
largest single volcano on Earth, 
rivalling even the mighty 
Olympic Mons on Mars. 

Tamu Massif, situated 
1,500 kilometres east of Japan, 
is roughly the size of the 
British Isles. Seismic-profiling 
studies penetrated its depths 
and revealed that its lava flows 
dip away from the volcano’s 
summit in all directions. 
This suggests that all the lava 
came from a single eruptive 
vent, say William Sager of 
the University of Houston in 


ANIMAL BEHAVIOUR 


Clumping caterpillars 


Entomologists sometimes see caterpillars clumping together, 
but the reason for this behaviour has defied explanation. 
Researchers had proposed that the aggregations conserved 
water or energy, but John Terblanche of Stellenbosch 
University in South Africa and his colleagues have dispatched 
those ideas. They collected Cape Lappet moth caterpillars 
(Eutricha capensis) and reared them as individuals and in 
groups of up to 100. Metabolic rates and water use did not 


decrease with group size. 


Given that aggregating caterpillars did not conserve energy, 
the researchers suggest that the behaviour may confer other 
advantages, such as faster growth rates or safety in numbers. 
J. Exp. Biol. http://doi.org/npp (2013) 


Texas and his colleagues. 

The dome-shaped mountain 
probably formed in a single 
eruptive blast that began some 
140 million years ago. 

Nature Geosci. http://doi.org/ 
nqd (2013) 

For a longer story on this research, 
see go.nature.com/beeqp4 
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US forests grow 
to be different 


Trees have made a major 
comeback in the northeastern 
United States, but the 
regrown forest is in some 


ways markedly different from 
that of 400 years ago, when 
European colonists started 
clearing trees for agriculture. 
Jonathan Thompson of the 
Smithsonian Conservation 
Biology Institute in Front 
Royal, Virginia, and his 
colleagues examined land- 
survey records from 1620 to 
1825 in nine northeastern 
states. Colonial surveyors 
often divided up plots of land 
using ‘witness trees’ to mark 
the corners, generally noting 
the genus, so the authors were 
able to compare these data with 
modern forest inventories. 
Although most of the same 
tree species were present in 
both time periods (with the 
exception of the now-rare 
American chestnut, Castanea 
dentata), their relative 
abundances differ greatly. 
Colonists removed a forest 
dominated by species such as 
beech and hemlock conifers, 
but today’s trees form a canopy 
of mostly maple and birch. 
PLoS ONE 8, e72540 (2013) 


TECHNOLOGY 


A cheaper, quieter 
MRI machine 


Magnetic resonance imaging 
(MRI) is expensive, noisy and 
requires bulky equipment. 

It can also have side effects, 
such as stimulating nerves 

in patients. These problems 
arise from the constant 
switching between positive 
and negative magnetic-field 
gradients used to manipulate 
the spin of hydrogen nuclei 
throughout the patient’s body. 
The energized nuclei produce 
radiofrequency signals, which 
carry the information used to 
build up an image. 

By exploiting the 
radiofrequency pulses used to 
prepare the nuclei, Jonathan 
Sharp at Alberta Innovates 
Technology Futures in 
Calgary, Canada, and his 
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colleagues removed the need 
for switched magnetic fields. 
Instead, they manipulated the 
nuclei using pairs of resonant 
radiofrequency fields twisted 
in opposing directions and 

a static magnetic field. The 
technique could make MRI 
cheaper, accessible and quieter. 
NMR Biomed. http://doi.org/ 
nqf (2013) 


LANGUAGE 


Babies hear a 
primate’s call 


Babies listen to lemur 
vocalizations in the same 
way that they listen to human 
speech. 

A baby’s language skills 
develop rapidly during the first 
year, and previous research has 
shown that by three months, 
hearing human speech while 
viewing objects helps infants to 
group objects into categories. 
Alissa Ferry at the International 
School for Advanced Studies in 
Trieste, Italy, and her colleagues 
examined how recordings of 
calls from a lemur (Eulemur 
macaco flavifrons; pictured) 
influenced how infants 
performed when they were 
asked to discriminate between 
images of dinosaurs and fish. 

The team found that 
lemur calls helped three- to 
four-month-old infants to 
categorize objects but did not 
help six-month-olds. The 
study suggests that the link 
between language and the 
capacity to categorize objects 
is initially broad enough to 
include calls from non-human 
primates, but quickly becomes 
tuned to human language. 
Proc. Natl Acad. Sci. USA 


http://doi.org/nqx (2013) 


SYNTHETIC BIOLOGY 


Forcing fluorine 
into molecules 


Researchers have discovered 
a pathway for introducing 
fluorine atoms into naturally 
occurring molecules. 

Fluorine is present in up to 
30% of pharmaceuticals and 
can expand the usefulness 
of natural products by, for 
example, increasing the time 
they take to break down 
in the body. But until now, 
chemists have been using 
a single method — the 
fluoroacetate pathway — to 
insert fluorine into organic 
molecules. 

Michelle Chang of the 
University of California, 
Berkeley, and her colleagues 
have found a different 
way to insert atoms of the 
element into a useful group of 
molecules called polyketides. 
Their method enlists a soil 
bacterium (Streptomyces 
cattleya) for the first steps. 
The bacterium binds fluorine 
to carbon, making building 
blocks such as fluoroacetate 
monomers that can then be 
inserted into polyketides 
in the place of acetate. 

The team demonstrated 
the method in the 
laboratory and in living 
cells, in which they were 
able to control where the 
fluorine atoms ended up in the 
polyketide molecules. 

Science 341, 1089-1094 
(2013) 


PHYSICS 


Astartling value 
for gravitation 


The quest to pin down the 
fundamental constants of 
nature usually results in 
increased precision over 
time, but knowledge of 
the Newtonian constant 
of gravitation (G), known 
among physicists as Big G, 
has not improved much in 
recent years because different 
measurement methods 
continue to disagree. 

Now, Terry Quinn and his 
colleagues at the International 
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COMMUNITY 


CHOICE 


Nuclear receptor linked to fitness 


Would-be dopers may have a new 
target: the Rev-erb-a protein. Known to 
regulate sugar and fat metabolism, the 
nuclear receptor has now been linked to 
the production and function of mitochondria — the cell’s 
metabolic powerhouses. 

A team led by Bart Staels and Héléne Duez, jointly at the 
Lille II University of Health and Law and the Pasteur Institute in 
Lille, France, showed that mouse muscle cells lacking Rev-erb-a 
contain dysfunctional mitochondria, and that mice lacking the 
gene encoding it, Nr1d1, could not run as fast as normal mice. 
The reverse occurred when the protein was overexpressed or 
when a synthetic agonist was given, suggesting that Rev-erb-a 
could be targeted by drugs to improve exercise capacity by 
boosting mitochondrial number and function in muscle cells. 
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Nature Med. 19, 1039-1046 (2013) 


Bureau of Weights and 
Measures in Paris have 

added to the uncertainty 

by finding a value for G of 
6.67545 x 10" m’kg's”. 
This is significantly larger than 
several other measurements 
from the past decade, but 

is in agreement with a 2001 
result by the same group using 
a similar but independent 
experimental set-up. 

The researchers used a 
torsion balance (pictured), 
in which a thin metal strip 
changes orientation in 
response to test masses. The 
authors say that they do not 
know why their apparatus 
gives a different result from 
other approaches. 

Phys. Rev. Lett. 111, 101102 
(2013) 


Monkeys raise the 
alarm on predators 


For the first time, researchers 
have shown that non-human 
primates emit calls in 

specific sequences to 

convey the type and 
location of the threat. 
Non-human primates 

are known to produce calls 
that signal different kinds of 
danger. Richard W. Byrne at 
the University of St Andrews, 
UK, and his colleagues 
conducted an experiment on 
five groups of titi monkeys 
(Callicebus nigrifrons) living in 
a reserve to find out whether 
monkey vocalization encoded 
predator type (raptor or 
snake), elevation (tree or 
ground) or both. 

A raptor in the air elicited 
only calls of type A; a raptor 
on the ground, calls of type A, 
followed by B. Conversely, a 
predator on the ground elicited 
pure B calls, but a ground 
predator in the trees brought 
forth B calls, followed by A. 
Biol. Lett. 9,20130535 (2013) 
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SEVEN DAYS 


| _ERESEARCH 
Dark-energy survey 


A massive sky-mapping 
project to study dark energy 
has begun, announced by 
Fermilab in Batavia, Illinois, on 
3 September. The Dark Energy 
Survey at the Cerro Tololo 
Inter-American Observatory in 
Chile launched on 31 August. 
Over the next five years, it 

will map 300 million galaxies 
covering one-eighth of the 
night sky. Data from its 
570-megapixel camera may 
lead to a better understanding 
of the mysterious dark energy 
that is thought to drive 

the Universe's accelerating 
expansion. See go.nature.com/ 
ybcueb for more. 


Baby-genome plan 
Hundreds of babies in the 
United States will have their 
genomes sequenced as part of 
a US$25-million programme 
unveiled on 4 September by 
the US National Institute of 
Child Health and Human 
Development and the National 
Human Genome Research 
Institute. The agencies have 
chosen four research teams, 
which will receive grants 

over the next five years, to 

test whether sequencing can 
provide more information than 
conventional tests that screen 
newborns for genetic and other 
disorders. See page 135 and 
go.nature.com/izz7u9 for more. 


High-tech probe 
Making better devices for 
measuring brain activity 

in animals is the goal ofa 
US$5.5-million collaboration 
announced on 4 September. 
Partners include the Howard 
Hughes Medical Institute 

in Chevy Chase, Maryland; 
the Allen Institute for Brain 
Science in Seattle, Washington; 
and the nanoelectronics- 
research company imec 

in Leuven, Belgium. The 
project, which also involves 
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NASA homes in on next Mars landing site 


The next US Mars lander will touch down on 
one of four candidate sites announced by NASA 
on 4 September. The lander, dubbed Interior 
Exploration Using Seismic Investigations, 
Geodesy and Heat Transport (InSight), is set to 
launch in March 2016 and will study the planet's 
deep interior to help researchers to understand 
how rocky planets formed. There were 22 


University College London 
and UK biomedical charity 

the Wellcome Trust, aims to 
pack more electrodes than ever 
onto brain probes that record 
signals between neurons. 

The group hopes to make the 
devices available for sale by the 
end of 2016. 


POLICY 


Gas phase-down 
Leaders of the Group of 20 
(G20), representing the world’s 
largest economies, agreed 

on 6 September to regulate 
hydrofluorocarbons (HFCs) 
under the Montreal Protocol 
on Substances that Deplete the 
Ozone Layer. Primarily used 
as refrigerants, HFCs were 
developed as alternatives to 
ozone-depleting chemicals, 
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but they are powerful 
greenhouse gases. Parties 

to the protocol will debate 
amending the treaty to 
implementa global phase- 
down of HFCs at a meeting in 
October in Bangkok. 


Carbon-tax repeal 
The newly elected prime 
minister of Australia, Tony 
Abbott, has committed 

to scrapping the previous 
governments plans to puta 
price on carbon emissions. 
He vowed as part of his 
campaign to repeal the 
carbon tax, which outgoing 
Prime Minister Kevin Rudd 
had already promised to 
soften to a carbon-trading 
scheme. Abbott, who beat 
Rudd in national elections 
on 7 September, said that it 


candidates sites; the semi-finalists were chosen 
for their level and smooth terrain. The sites 

are all on an equatorial plain in an area called 
Elysium Planitia, close to where other landers 
have set up home (pictured). InSight will drill 
2-5 metres into the Martian surface, so scientists 
wanted sites that would not harbour large rocks 
or solid bedrock that could block the probe. 


would be “a very early item 
of business”. But Abbott may 
face a fight to push the repeal 
through parliament. 


FACILITIES 


Spanish clash 

Plans to build a particle 
accelerator and neutron source 
in Bilbao, Spain, have come to 
an abrupt halt. The governing 
board of the €180-million 
(US$238-million) European 
Spallation Source-Bilbao 
(ESS-Bilbao) announced on 

29 August that the project’s 
scientific director had been 
removed, and that the executive 
director’s contract would not 
be renewed. The board cited 

a need to analyse and refocus 
the initiative. ESS-Bilbao, 
which has encountered budget 


JPL-CALTECH/NASA 


a shortfalls, was launched in 2009 
as a test-bed for a larger facility 
planned for Lund, Sweden. 


PEOPLE 


Libel suit tossed 
On 4 September, the Taipei 
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dismissed a criminal libel 

suit against environmental 
engineer Ben-Jei Tsuang 

of National Chung Hsing 
University in Taichung. The 
petrochemical company 
Formosa Plastics Group (FPG) 
in Taipei had sued Tsuang for 
US$1.3 million in damages for 
suggesting that its emissions 
were linked to an increased risk 
of cancer. Tsuang presented his 
results at a scientific meeting in 
2010 and a press conference in 
2011, and has since submitted 
a paper on the topic. A civil 
lawsuit against Tsuang by FPG 
is still pending. See go.nature. 
com/ukmxye for more. 


AWARDS 


Balzan prizes 

Pascale Cossart (pictured), 

a microbiologist at the 

Pasteur Institute in Paris, 

has been awarded one of 

four 750,000-Swiss-franc 
(US$800,000) Balzan prizes 
for her work on the molecular 
biology of pathogenic bacteria 
and their interaction with host 
cells. Physicist Alain Aspect 
of the Ecole Polytechnique 

in Palaiseau, France, won 


FREDERIC 


With existing oil pipelines at 
near full capacity, increased 
production in the oil sands of 
Alberta, Canada, will depend 
on new projects such as the 


SOURCE: NRDC/US DEPT STATE 


controversial Keystone XL, says 
a report from the Sierra Club in 


San Francisco, California. The 


environmental group argues that 


halting Keystone — expected 
to carry about 730,000 barrels 
of oil per day from 2015 — 
would prevent an increase in 
greenhouse-gas emissions by 


impeding further development of 


carbon-rich oil in the region. 


TREND WATCH 


another of the prizes for his 
contributions to quantum 
information processing. 


Lasker award 

Richard Scheller of Genentech 
in South San Francisco, 
California, and Thomas 
Siidhof of Stanford University 
in California have won this 
year’s US$250,000 Lasker 
Award for Basic Medical 
Research for their work on the 
molecular mechanisms that 
underlie the rapid release of 
neurotransmitters in the brain. 
Winners of the award, which 
this year was announced on 

9 September, often go on to get 
a Nobel prize. See go.nature. 
com/ubuzgy for more. 


Ocean contest 

The X Prize Foundation is 
offering US$2 million ina 
competition, sponsored by 

US philanthropist Wendy 
Schmidt, to improve ocean 

pH monitors, it announced 

on 9 September. Researchers 
can compete for one or both of 


OIL SAND EXPORTS 


two $1-million prize pots — 
one for the most accurate 

and another for the most 
cost-effective sensor. Subtle 
changes in ocean acidity can 
have marked effects on marine 
life, but existing sensors do not 
work well at depth or over long 
periods of time, and are too 
expensive to deploy widely. See 
go.nature.com/ttlysw for more. 


EVENTS 


Lunar orbiter up 

On 6 September, NASA 
launched the Lunar 
Atmosphere and Dust 
Environment Explorer 
(LADEE) from Wallops Flight 
Facility in Virginia. The orbiter 
should arrive at the Moon 

in about 30 days, and will fly 
20-50 kilometres above the 
surface. LADEE will collect 
dust and gas molecules in the 
lunar atmosphere, searching 
for silicon, magnesium and 
other elements ejected from 
Moon rocks. Researchers hope 
that data from LADEE could 
help to explain the twilight 
glow above the Moon's horizon 
seen by missions in the 1960s. 
See go.nature.com/j2tzmn 

for more. 


Labour row ends 
The Atacama Large 
Millimeter/submillimeter 
Array (ALMA) observatory 
restarted operations in Chile 
on 9 September, after a 
labour strike shut down the 


Companies need new pipelines to sustain production from 


Canadian oil sands. 
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SEVEN DAYS | THIS WEEK | 


16-21 SEPTEMBER 
The Seventh 
International Congress 
on Advanced 
Electromagnetic 
Materials in Microwaves 
and Optics takes place 
in Bordeaux, France, 
where scientists will 
share the latest results in 
metamaterials research. 
Topics will include 
advances in cloaking, 
and medical applications 
of metamaterials. 
go.nature.com/yqmli6 


centre for 17 days. Contract 
negotiations had broken 
down between the union that 
represents most of ALMA’s 
Chilean staff and Associated 
Universities Incorporated, 
based in Washington DC, 
which employs the workers 
(see go.nature.com/vhxfrv). 
The two-year contract signed 
on 7 September includes 
shorter working shifts, a 4% 
pay rise for union workers at 
the low end of the wage scale, 
and a bonus for workers at the 
highest-elevation ALMA site. 


Drug resurrected 


On 4 September, AstraZeneca 
launched clinical studies 

of a previously abandoned 
experimental ovarian-cancer 
drug. Researchers had hoped 
that olaparib would herald a 
new class of drug targeting 

a DNA-repair protein called 
PARP, but the London-based 
pharmaceutical firm halted the 
drug's development in 2011 
after a failed clinical trial (see 
Nature 483, 519; 2012). Further 
analysis suggested that olaparib 
may work best against cancers 
bearing mutations that affect 
other DNA-repair proteins, 
leading the company to launch 
the latest trial in patients with 
BRCA-mutant cancers. 
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Plan to offload Chemical databases More cuts loom for On the trail 
labs vexes environmental pave the way for ‘greener’ US science as sequestration of the spy who chased 
scientists p.145 fracking fluids p.146 drags on... and on p.147 lizards and frogs p.150 


The grey wolf, protected for more than 30 years, could see its endangered status removed nationwide. 


CONSERVATION 


Grey wolves left 
out in the cold 


US plan to remove federal protection elicits howls of protest. 


BY CHRIS WOOLSTON 


( yas Kentucky is coyote country. 
But the 33-kilogram animal shot by a 
hunter near Munfordville this spring 

was definitely not a coyote. Its huge paws, 

broad snout and massive build suggested that 
it was a grey wolf (Canis lupus) — the first to be 
shot in Kentucky in more than 150 years. DNA 
tests confirmed the animal’s identity in August. 

The animal, a possible stray from hundreds 
of kilometres away in Michigan or Minnesota 

(although it cannot be ruled out that it was 


once captive), was also a player in a growing 
debate that mixes science, politics and pas- 
sionate public opinion. From Kentucky to 
California, wolves are forcing biologists and 
policy-makers to re-examine the US Endan- 
gered Species Act (ESA) and the very defini- 
tion of an ‘endangered’ species. 

The act, introduced 
in 1973, was a landmark 
piece of legislation. Its | Whythe US wildlife 
purpose has been con- _ servicewants tolist 
tentious ever since, but anewwolf species: 
it is intended to save j 


species “in danger of extinction throughout 
all or a significant portion” of their range. 
Although wolves have never been at risk of 
extinction in the United States as a whole, 
those in the 48 contiguous states were classi- 
fied as endangered in 1978. 

After decades of federal protection and 
reintroduction programmes, the US Fish and 
Wildlife Service (FWS) undertook a compre- 
hensive review, which found that wolf popu- 
lations near the western Great Lakes and the 
northern Rocky Mountains had recovered suffi- 
ciently to warrant removing ESA protection (see 
‘Wolf pack’). (There are now about 4,000 wolves 
in the Great Lakes area and nearly 1,700 in the 
northern Rockies.) Wolves in these areas were 
‘delisted’ between May 2011 and August 2012. 

But in June this year, the FWS proposed 
removing ESA protection from all US grey 
wolves, citing the earlier review as evidence 
of their recovery and arguing that the original 
listing had erroneously included regions out- 
side the species’ historical range. The agency 
says that by delisting the rest of the US wolf 
population, it can concentrate its resources on 
ESA protection for the Mexican wolf (Canis 
lupus baileyi), a subspecies of the grey wolf. 

The proposal marks a turning point for the 
grey wolf. A century ago, the animals had been 
hunted almost out of existence south of the 
United States—Canada border. Now, as a result 
of the partial delisting, six states have wolf- 
hunting seasons. In Montana, 225 wolves were 
legally trapped or shot in the 2012-13 season. 

Grey wolves removed from the ESA pro- 
gramme would be managed by states, some of 
which have in the past shown little interest in 
protecting wolves or expanding their territory. 
Delisting the wolves would essentially prevent 
them from reclaiming large parts of their his- 
toric range in places such as California, the 
southern Rockies and the northeast, says John 
Vucetich, a forest scientist at Michigan Tech- 
nological University in Houghton. And, as the 
appearance of the wolf in Kentucky suggests, 
pushing boundaries is a wolf speciality. 

“The Fish and Wildlife Service is essentially 
saying that this is the best that wolves can do, 
and it’s not even close,” he says. “Wolves are on 
the verge of setting a precedent for the Endan- 
gered Species Act? 

Robert Wayne, an ecologist and evolution- 
ary biologist at the University of California, Los 
Angeles, says that wolves need broad ranges and 
large populations to return to their historic 


12 SEPTEMBER 2013 | VOL 501 | NATURE | 143 


© 2013 Macmillan Publishers Limited. All rights reserved 


| NEWS IN FOCUS 


WOLF PACK 


By October 2012, grey wolf populations in six 
US states had recovered such that they no longer 
needed protecting by the Endangered Species Act. 


| 
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Not protected Endangered 
!> Experimental population 


> levels of gene flow and diversity. In 2005, 
he and his colleagues analysed mitochondrial 
DNA from specimens collected before wolves 
were decimated in the 1900s, and found that 
it contained twice as many variations as DNA 
from modern wolves (J. A. Leonard et al. Mol. 
Ecol. 14, 9-17; 2005). The researchers estimated 
that the wolf populations in Mexico and the 
western United States had once reached 380,000 
individuals. “Wolves have not recovered over a 
large part of their range,’ Wayne says. 


But Gary Frazer, assistant director for endan- 
gered species at the FWS in Arlington, Virginia, 
says that the service exceeded its own minimum 
targets for wolf recovery as early as 2001, and 
thus it is a case of mission accomplished. “That 
was the plan from the beginning: to declare 
recovery, to delist the species, and to move on 
to other species that need our attention,” he says, 
noting that the agency's resources are limited. 

Wolves might occupy only a fraction of 
their historic range, but they are not in dan- 
ger of extinction, adds Mark Boyce, a biolo- 
gist at the University of Alberta in Edmonton, 
Canada. “We have 6,000 wolves in Alberta 
alone,” he says. “Except for Mexican wolves, 
the populations in the lower 48 states add 
nothing to the genetic diversity of the species.” 
Boyce believes that any expansion of the 
wolves’ range would be costly for ranchers. 
In 2011, he co-authored a study that tracked 
wolves using the Global Positioning System, 
showing that each wolf pack in southwestern 
Alberta killed an average of 17 cattle every 
year (A. T. Morehouse and M. S. Boyce Front. 
Ecol. Environ. 9, 440-445; 2011). 

The wolf controversy highlights the strained 
relationship between science and politics. 
Vucetich and Wayne, along with Roland Kays 
of the North Carolina Museum of Natural 


Sciences in Raleigh, were, they claim, dropped 
in August from a panel to review the FWS pro- 
posal because they had publicly opposed the 
wolf’s delisting. “Pm not mad about not being 
on the panel, but it doesn’t seem like they were 
following proper procedure,’ Wayne says. “It 
was punitive,’ he claims. 

The review process has since been restarted. 
“We still havent figured out how to handle a 
situation where experts have outspoken views,” 
Frazer says. “We are not an academic institu- 
tion. We're trying to implement federal law.” The 
public consultation period will close in October, 
but because the panel's peer review will not be 
complete by then, Frazer plans to reopen pub- 
lic comments in January 2014. “People are very 
passionate about wolves,’ he says. The final deci- 
sion may take a year or more. 

The future of US wolves will hinge mainly 
on public acceptance of their delisting. Groups 
such as Defenders of Wildlife in Washing- 
ton DC protest against wolf hunting, whereas 
those affiliated with hunters and ranchers want 
wolves to be aggressively controlled. Some indi- 
viduals have made death threats to ranchers 
who legally shot wolves that attacked livestock. 

Vucetich thinks that the government is eager 
to pass the issue on to the states. “It saps the 
energy of people working on it,” he says. = 


SOURCE: US FISH AND WILDLIFE SERVICE 


MATHEMATICS 


Physicists net fractal butterfly 


Decades -old search closes in on recursive pattern that describes electron behaviour. 


BY DEVIN POWELL 


fter a nearly 40-year chase, physi- 
As have found experimental proof 

for one of the first fractal patterns 
known to quantum physics: the Hofstadter 
butterfly. Named after Douglas Hofstadter, the 
Pulitzer prizewinning author of the 1979 book 
Gédel, Escher, Bach, the pattern describes the 
behaviour of electrons in extreme magnetic 
fields. 

To catch the butterfly, scientists have had 
to fashion innovative nets. Since May, sev- 
eral groups have published experiments 
that sought the pattern using hexago- 
nal lattices of atoms; last month, oth- 
ers reported seeking it with atomic laser traps. 
Some physicists say that studying the pattern 
could help in the development of materials with 
exotic electric properties. But the main point 
of the chase was to check whether the butterfly 
looks as predicted. 

“Hofstadter’s concept was initially disturb- 
ing to a lot of people,’ says Cory Dean, an 
experimental physicist at the City College of 
New York. “Now we can say his proposal wasn't 


Hofstadter’s butterfly describes electron motion. 


so crazy after all” 

Hofstadter, now a cognitive scientist at 
Indiana University Bloomington, sketched 
out the pattern in the 1970s while a graduate 
student in physics. It was known at the time 
that electrons under the influence of a mag- 
netic field would race around in circles. But 
Hofstadter found that in theory, if the electrons 
were confined inside a crystalline atomic lat- 
tice, their motion would become complicated. 
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As the magnetic field was cranked up, 

the energy levels that define the motion 
of electrons would split again and again. 
When represented on a graph, those energy 
levels revealed a pattern that looked like a but- 
terfly — and continued to do so, even when 
zoomed in to infinitely small scales. 

Mathematician Benoit Mandelbrot had 
yet to popularize the term ‘fractal’ for such 
recursive patterns, and Hofstadter’s adviser 
was unimpressed. “He scornfully called the 
nesting pattern that this upstart youngster 
claimed to see, ‘mere numerology’. says 

Hofstadter. “He even told me that I 

would be unable to get a PhD for this 
kind of work” Hofstadter published’ his 
description of the butterfly in 1976, after fin- 
ishing his PhD. 

The idea was difficult to test. The strength 
of the required magnetic field depends on 
the spacing between the atoms in the lattice. 
In conventional materials, in which atoms 
are separated by less than one-billionth of a 
metre, the pattern can emerge only in fields 
on the order of tens of thousands of tesla. The 
best available magnets can reach only about 


DOUGLAS HOFSTADTER 


100 tesla, and for just a fraction of a second. 

But smaller fields are sufficient in lattices 
with larger spacings, which can be created 
by layering materials in stacks. In May, 
researchers reported’ that they had stacked 
a single sheet of graphene, in which carbon 
atoms are arranged like ahoneycomb, on top 
of a sheet of honeycombed boron nitride. 
The layers create a repeating pattern that 
provides a larger target for magnetic fields 
than the hexagons in each material — effec- 
tively magnifying the field. 

After applying a field, the researchers 
measured discrete changes in the conduc- 
tivity of the composite material — stepwise 
jumps that result from splits in the energy 
levels of its electrons. These were not a direct 
detection of the expected electron behav- 
iour, but were a proxy for it. Hofstadter’s 
butterfly had not quite flown into the net, 
but it had revealed its existence. “We found 
a cocoon,” says Pablo Jarillo-Herrero, an 
experimental physicist at the Massachu- 
setts Institute of Technology (MIT) in 
Cambridge. “No one doubts that there's a 
butterfly inside” 

Nobel laureate Wolfgang Ketterle, another 
physicist at MIT, is going after the butterfly 
in a different way: by making atoms act like 
electrons. To do this, he chills rubidium 
atoms to a few billionths of a degree above 
absolute zero, and uses lasers to trap them in 
a lattice with egg-carton-like pockets. 

When zapped by an extra pair of criss- 
crossed lasers, the atoms tunnel from one 
pocket to another. Tilting the grid allows 
gravity to guide the atoms into paths that 
mimic the circular motions of an electron 
in a magnetic field — although no actual 
magnetic fields are involved. The system can 
easily track the motion of individual atoms, 
and should be able to mimic a magnetic field 
strong enough to produce a Hofstadter’s 
butterfly. “Cold atoms will give us an enor- 
mous freedom,’ says Ketterle, whose team 
posted its study on the preprint server arXiv 
last month’. But the set-up has a liability: the 
lasers tend to heat the cold atoms, limiting 
the ability to control the energies of the par- 
ticles and reveal the fractal pattern. 

Still, if the heat can be handled and the 
butterfly simulated, this system could be 
a starting point for exploring quantum 
behaviours in solids, such as materials that 
can conduct electricity on the surface but 
are insulators at the core. Dieter Jaksch, a 
physicist at the University of Oxford, UK, 
says, “I expect that a wealth of new phe- 
nomena and insights will be found when 
exploring the butterfly.’ m 


1. Hofstadter, D. R. Phys. Rev. B 14, 2239-2249 
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3. Miyake, H., Siviloglou, G.A., Kennedy, C. J., 
Burton, W. C. & Ketterle, W. Preprint at http:// 
arxiv.org/abs/1308.1431 (2013). 


ENVIRONMENTAL SCIENCES 


IN FOCUS | NEWS 


Hackles rise over 
privatization plan 


UK Natural Environment Research Council proposes to cut 
four institutes loose, but scientists fear for long-term data. 


BY DANIEL CRESSEY 


r “he UK Natural Environment Research 
Council (NERC) is in a quandary. The 
government body, which channels 

money to environmental scientists, has for 
weeks been soliciting evidence on whether 
it should hand funding control of four of its 
five key research institutes to the private sec- 
tor. The move is meant in part to decrease the 
institutes’ reliance on waning government 
funds, but leading scientists have now gone 
public with their concerns that it could jeop- 
ardize research and data of crucial importance 
to environmental science in the United King- 
dom and around the world. 

At stake are the futures of the National 
Oceanography Centre, the British Geological 
Survey, the Centre for Ecology and Hydrol- 
ogy and the National Centre for Atmospheric 
Science. (The British Antarctic Survey, which 
NERC also runs, is not affected.) As well as con- 
ducting research on a variety of environmental 
topics, all four are closely linked to specialist 
centres that collect 


long-term data, “We need 

such as the British Safeguards for 
Oceanographic Data these unique 
Centre, hosted by assets and 

the National Ocean- long-term, 


ography Centre in 
Liverpool. In total, 
the institutes have a 
budget of about £400 million (US$628 million). 

“The NERC centres uniquely provide long- 
term consistent data, and make them freely 
available for the benefit of ecological science 
and to improve our understanding of the natu- 
ral world,’ says William Sutherland, president 
of the British Ecological Society in London. 
“These data include studies that are under- 
taken over the course of decades, protected 
from changes in fashion or the fluctuations of 
short-term demands. Any change in owner- 
ship of the centres must preserve this.” 

Helen Snaith, a remote-sensing researcher 
at the National Oceanography Centre in 
Southampton and a trade-union representa- 
tive, notes that advice that the centres provide 
to the government could be compromised 
if they start generating significant income 


large-scale 
perspectives.” 


from private sources. “There’s the potential 
for a very clear perceived conflict of interest,” 
she says. She also worries that the roughly 
1,750 members of staff at the four centres, 
about two-thirds of whom are researchers, 
could get a worse deal on pay and benefits 
under private ownership. 

Duncan Wingham, NERC’s chief executive, 
stresses that no decision has yet been taken. If 
the centres are moved out of the public sector, 
he says, it would not necessarily mean that they 
become profit-making. They could, for exam- 
ple, become part of universities. He has also 
emphasized that the decision on the centres’ 
futures will not consider cost savings, which 
most interested parties concede. 

There may also be advantages, adds Wing- 
ham — notably that freeing the institutes of 
public-sector constraints on pay and pro- 
motion, and from reliance on government 
funding, could give them better flexibility to 
respond to new opportunities. 

Steve Ormerod, an ecologist at Cardiff Uni- 
versity and chairman of the Royal Society for 
the Protection of Birds, acknowledges this. 
He sees advantages if the Centre for Ecology 
and Hydrology can develop partnerships on 
its own terms with international agencies and 
businesses, and says that being independent 
might allow the centre to win more funding 
and attract more researchers. 

But there are risks, he says. “We need safe- 
guards for these unique assets, skills and long- 
term, large-scale perspectives that have always 
provided crucial support for impartial, highly 
rigorous, evidence-based advice,’ 

NERC’s call for evidence on the proposal 
closed at the end of August, and submissions 
are being reviewed. The NERC board will 
decide on the institutes’ futures in December. If 
the research council does choose to divest itself 
of these centres, the decision would represent 
almost the end of an era for government- 
controlled science in the United Kingdom. 
According to its 2011-12 report, the UK Bio- 
technology and Biological Sciences Research 
Council has made arrangements to “remove 
[its] ability to exert control” over some of its 
institutes; and the Medical Research Council 
is transferring some of its in-house units to 
universities. m 
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ENVIRONMENTAL CHEMISTRY 


Secrets of fracking fluids 
pave way for cleaner recipe 


Disclosure of chemicals used in hydraulic fracturing will empower green chemistry. 


BY JEFF TOLLEFSON 


he myriad liquid concoctions used in 
| hydraulic fracturing make for quite a 
recipe book. Since January 2011, Frac- 
Focus, an online chemical-disclosure registry, 
has assembled a list of the mixtures used at 
more than 52,000 oil and gas wells across the 
United States. In these data, geochemist Brian 
Ellis sees opportunity. He plans to mix differ- 
ent chemicals into oil- and gas-rich shale rock 
inside a pair of high-pressure chambers that 
he is building. This will allow him to explore 
the reactions that occur when these ‘fracking’ 
fluids are injected deep underground. 

The fluids, which are mixed with sand, are 
predominantly water, laced with 1% ‘special 
sauce. The recipes for that fraction — a mix- 
ture that includes acids, solvents and corro- 
sion inhibitors — were until the last few years 
secrets guarded by the companies that seek to 
penetrate shale formations to release stores of 
fossil fuels. But in the face of widespread con- 
cern about water contamination, 21 US states 
have adopted mandatory disclosure rules for 
the mixtures, making it easier for scientists 
such as Ellis, of the University of Michigan at 
Ann Arbor, to assess their impacts. 

Much of the data end up in registries such as 
FracFocus, which is overseen by state energy 
and water organizations (see ‘A recipe for 
fracking’). “There are still a lot of bugs, but the 
vast majority of companies are now disclosing 
their chemicals,” says Scott Anderson, a senior 
policy adviser for the Environmental Defense 
Fund in Austin, Texas, which advocates for 
greener fracking procedures. 

More than 500 companies have reported 
data to FracFocus so far. Academic research- 
ers, advocacy groups and companies are 
now poring over those recipes to assess their 
toxicity in the hope of narrowing them down 
to a group of environmentally acceptable ones 
—and perhaps spurring the synthesis of even 
greener alternatives. 

The boom in the disclosure of fracking- 
fluid components has occurred despite the 
fact that the federal government has yet to 
weigh in with its own rules. The Department 
of the Interior has proposed requiring the dis- 
closure of chemicals used during hydraulic- 
fracturing operations on public lands, but 
much of the current oil and gas development 


is taking place on private land. Many compa- 
nies are volunteering the information anyway, 
even in states that have no disclosure require- 
ments. And companies that do the hydraulic 
fracturing, such as Halliburton and Baker 
Hughes, both based in Houston, Texas, are 
developing their own chemical-assessment 
programmes in an apparent effort to address 
public concerns and reduce their environmen- 
tal footprint. 

The data in these registries, although 
increasingly abundant, remain incomplete, 
unconsolidated and difficult to compare. 
The European Union is phasing in a uni- 
fied chemical-regulation programme that 
governs reporting across all commercial 
sectors. Energy companies operating in the 
North Sea, for example, must all play by the 
same rules and abide by 
strict reporting require- 
ments. But in the United 
States, the regulations 
on chemical reporting 
remain a mixture of state 


For concerns 
about the impact of 
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and national policies that vary by industry. 

Even in states with disclosure laws, compa- 
nies can omit information in the interests of 
protecting intellectual property. For example, 
a subsidiary of ExxonMobil, based in Irving, 
Texas, declined to list the components of a gel- 
ling agent — used to help to suspend sand in 
water — at one of its wells in Wyoming, calling 
the information a “trade secret”. 

The result is that companies are still oper- 
ating under their own risk assessments and 
not disclosing all of the information that 
might be needed for independent verifica- 
tion. “If everybody has a different defini- 
tion of what is hazardous and doesn’t fully 
disclose the chemicals they use, then it is 
going to be awfully difficult to compare,” says 
Lauren Heine, co-director of Clean Produc- 
tion Action, an advocacy group based in 
Somerville, Massachusetts. 

Heine’s group is sifting through company 
disclosures to perform a risk assessment on 
the most commonly used chemicals. The 
effort is designed to provide a single point of 
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comparison so that scientists, industry and 
the public can make informed decisions about 
which chemicals are best. 

Daniel Durham, who heads a chemical- 
assessment programme at the Houston-based 
energy company Apache, says that although 
Heine’s effort is promising, companies do not 
need to wait. The US Environmental Protec- 
tion Agency (EPA) already maintains its own 
public registry of preferred chemicals for vari- 
ous industrial processes. Companies that want 
to register their chemicals provide the EPA 
with toxicity and environmental-assessment 
data; the registry also allows companies to 
keep certain data confidential if intellectual 
property is involved. 

The upshot is a growing — albeit incom- 
plete — list of preferred chemicals that com- 
panies such as Apache can choose from as 
they design their fracking fluids. A company 
that wants to avoid using a solvent such as 
ethylene glycol monobuty] ether, for exam- 
ple — used to reduce viscosity but possibly 
toxic to the endocrine system — could look 
through the EPA list for alternatives. “It’s a 
very good road map to green chemistry,’ 
Durham says. 

Eventually, Durham hopes that research- 
ers will help to develop novel chemicals that 
could be used to make the entire hydraulic- 
fracturing process cleaner and more efficient. 


Other 0.8% 


A RECIPE FOR FRACKING 


Once a well has been drilled and sealed off, companies inject 


hydraulic fracturing fluids at high pressures to break up the 
rock and allow oil and gas to flow. These fluids, which are 
mostly water, are mixed with sand; this is used to prop 
fractures open. Acids dissolve minerals and initiate cracks. 
Gelling agents are used to suspend sand in the water, and 
breakers delay breakdown of the gels. Friction reducers 
lubricate the fissures. Pipes are protected by corrosion and 
scaling inhibitors, biocides and chemicals that control 
reactions with iron and clay. 
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Water 99.2% 


Scientists such as Ellis could play an important 
part. 

Ellis wants to know whether fracking flu- 
ids are contributing to geochemical reactions 
within the shale rock that might free up poten- 
tially dangerous metals and radionuclides, 
such as arsenic, barium, strontium and ura- 
nium. These elements are often found in trace 
concentrations in the waste water produced 


@ Gellant 0.5 

@ Acid 0.07 

& Corrosion inhibitor 0.05 
@ Friction reducer 0.05 

™@ Clay control 0.034 


| Crosslinker 0.032 

@ Scale inhibitor 0.023 
Breaker 0.02 

© Iron contro! 0.004 

@ Biocide 0.001 


The specific fracking formula varies according to the company 
responsible for the work and the geology of the region. 


by oil and gas companies, but can also be 
found naturally in groundwater. Ellis eventu- 
ally hopes to help companies to select better 
chemicals that would minimize the potential 
for contamination and the need for waste- 
water treatment. But for now, he says, he is 
focused on the basic science. “Fundamentally, 
I just want to understand those reactions a 
little better” m 


More cuts loom for US science 


Stalemate in Congress puts spending plans on hold. 


BY LAUREN MORELLO 


nies. The mid-career molecular biologist 
moved last year to the Scripps Research 
Institute’s campus in Jupiter, Florida — a risky 
decision that saw her building a new laboratory 
group at a time when the US government was 
cutting its support for science. In June, Nied- 
ernhofer abandoned one of her main lines of 
research — reducing the toxicity of cancer 
drugs — after the National Institutes of Health 
(NIH) rejected her grant application. In July, 
the agency approved a second grant, allowing 
her to keep another research thrust alive — on 
the molecular mechanisms of ageing. But the 
NIH cut the award by 18%, preventing her from 
hiring an additional postdoctoral researcher. 
Niedernhofer is not alone. In a survey of more 
than 3,700 US scientists released on 29 August 
by the American Society for Biochemistry and 
Molecular Biology in Rockville, Maryland, one- 
third said that they had laid off researchers, and 
close to two-thirds had seen their funding fall 


| aura Niedernhofer is counting her pen- 


since 2010. Federal spending on research and 
development has declined by 16.3% since 2010, 
the fastest drop in a three-year period since the 
end of the space race in the 1970s, according 
to an analysis published on 3 September by the 
American Association for the Advancement of 
Science in Washington DC. 
The most drastic 


reduction occurred “Ther e is 

on 1 March, when continuing 
across-the-board pressure for 
budget cutsknownas additional 
sequestrationlopped budget cutsasa 
5% from the budgets price for raising 


of most government 
agencies. Science 
powerhouses such as the NIH in Bethesda, 
Maryland, and the National Science Founda- 
tion in Arlington, Virginia, began to scrimp by 
reducing the values and durations of grants, and 
the number of recipients per application cycle. 

The situation could worsen in the coming 
months. Congress, which returned to Wash- 
ington DC this week, has made little progress 


the debt ceiling.” 


on setting government spending for the 2014 
fiscal year, which begins on 1 October. An 
attempt by a group of Republican senators and 
the White House to negotiate an agreement on 
deficit reduction broke down in late August, 
and since then the crisis in Syria has diverted 
the attention of Congress. To avoid a govern- 
ment shutdown, lawmakers are expected to 
extend current funding levels until December. 
That extension, known as a continuing resolu- 
tion, would run out at about the same time that 
the country confronts another financial matter: 
surpassing its borrowing limit, or debt ceiling. 

That could set up a budget battle royal in the 
next few months. A similar fight in the sum- 
mer of 2011 led to the law that created seques- 
tration; it specifies annual spending reductions 
until 2021, if Congress does nothing to over- 
ride it. The next round of cuts, scheduled to 
take effect in January 2014, would trim spend- 
ing to 2% below the already-whittled-down 
2013 level. 

Indications of how the various science agen- 
cies will fare can be found in Congress's > 


12 SEPTEMBER 2013 | VOL 501 | NATURE | 147 


© 2013 Macmillan Publishers Limited. All rights reserved 


PIYAL ADHIKARY/EPA/CORBIS 


| NEWS IN FOCUS 


> unfinished spending bills. The Demo- 
cratic-controlled Senate would eliminate 
sequestration and give science agencies 
modest boosts. The Republican-controlled 
House of Representatives would cut fund- 
ing in many areas to keep total spending 
in line with the 2% cut that sequestration 
prescribes. One relatively bright spot is the 
House’ proposed allocation of $7.0 billion 
for the National Science Foundation, the 
same amount that the agency received in 
2012 — a bountiful level in House terms. But 
Barry Toivy, vice-president for public affairs 
at the Association of American Universi- 
ties in Washington DC, worries that House 
Republicans will now end up seeking cuts 
beyond 2%. “There is continuing pressure 
for additional budget cuts as a price for rais- 
ing the debt ceiling,” he says. 

The situation leaves US research institu- 
tions in an uneasy position, unsure whether 
2013 funding levels will have been the 
nadir, or a prelude to something worse. 
Many are just beginning to feel the effects, 
because of the delay between the sequestra- 
tion cuts and grants being awarded. At the 
University of Maryland in College Park, the 
haul of grants from the NIH was 7% below 
projections in the 2012-13 academic year, 
and its share of defence-department cash 
was 3% lower than expected, says chief 
research officer Patrick O’Shea. At Johns 
Hopkins University in Baltimore, Mary- 
land, four doctoral programmes in the 
department for environmental health sci- 
ences each accepted just one student this 
year, instead of the usual two or three, says 
Jonathan Links, a medical physicist who 
handles the university’s crisis planning. 

But US universities collect tuition fees 
and generally have endowments, which 
means that they have ways to provide stop- 
gap funding to scientists in tight spots. At 
a ‘soft-money’ research institute such as 
Scripps, grants are needed to pay almost all 
the bills — so Niedernhofer’s situation was 
more dire. 

Although she is continuing her ageing 
research with her three postdocs, she has 
a new standard question she asks before 
hiring them: will they consider only aca- 
demic research as a job? “I can’t guarantee 
that they will get that,’ she says, “and I don't 
want to be the one to break their hearts.” m 
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Ultimate upgrade 
for US synchrotron 


Argonne lab banks on beam-bending magnets in bid for 
world’s most focused X-ray light source. 


BY EUGENIE SAMUEL REICH 


very day, in dozens of synchrotrons 
Hees the globe, electrons are whip- 

ped around in circular storage rings 
to provoke them into emitting X-rays, 
useful for imaging materials, identifying 
chemical-reaction products and determining 
crystal structures. 

But photon scientists do not want just any 
old storage ring. For more than a decade, they 
have dreamt of ‘ultimate’ storage rings — 
ones that use specialized magnets to produce 
X-ray beams that are as tightly focused 
as theory allows. 

Now, researchers at the largest US synchro- 
tron, the Advanced Photon Source (APS) at 
the Argonne National Laboratory in Illinois, 
are taking steps to develop this technology. In 
the process, they hope to leapfrog several inter- 
national facilities that have a head start. 

In Sweden, ultimate-storage-ring technology 
is being pioneered at MAX IV, a 528-metre- 
circumference synchrotron in Lund. Scientists 
there first sought to increase the intensity and 
brightness of the synchrotron’s X-ray light in 
2006 by focusing electron beams more tightly. 
The design relied on groups of seven magnets, 
known as multi-bend achromats, that could be 
used in as many as 20 places around the ring to 
nudge the paths of electrons back and forth until 
they lined up more-or-less perfectly. Machine 
director Mikael Eriksson recalls that when he 
toured US light sources to describe the project, 
“few believed it” 

Eriksson now has believers. In a report posted 
online on 29 August, researchers at the Argonne 
lab describe how they are hoping to upgrade the 
1.1-kilometre-circumference APS with multi- 


Magnets for the Swedish MAX IV synchrotron. 


bend achromats (see go.nature.com/asxrqb). 
“There's a new technology that has come along 
and it’s pretty revolutionary,’ says APS director 
Brian Stephenson. Current storage rings have 
at most double-bend achromats, which contain 
two magnets rather than seven. Physicists had 
thought that including more magnets would 
make the beam unstable by bending it too much 
and introducing too many fluctuations. But the 
work at MAX IV showed that very compact 
magnets enable bending paths that are short 
enough to stop fluctuations from building up. 
The US Department of Energy, which funds 
the APS, still needs to approve the plan. In July, 
one of the department's advisory committees 
suggested that US labs were being left behind 
while other countries push towards ultimate 
storage rings. The committee had also recom- 
mended pursuing a next-generation X-ray 


IMAGE OF THE WEEK 


Indian parliament 
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programme of 
violating ethical 
standards 
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@ Bacteria in faecal feast help mice 
stay slim go.nature.com/spfyx8 

@ Yellow warblers protect coffee crops 
against borer beetle go.nature.com/nvtk5w 
@ Physics Nobel laureate responds to 
criticism of his ‘time crystals’ proposal 
go.nature.com/yypwjt 


Volcano found 
under Pacific 
Ocean is Earth’s 
largest at 650 
kilometres wide 
go.nature.com/ 
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FOCUSED BEAMS 
Five synchrotron facilities are developing special magnets so that they can become ultimate storage rings. 
528m 
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APS, Advanced Photon Source; ESRF, European Synchrotron Radiation Facility. 


laser, useful for making ‘molecular movies’ of 
chemical reactions, among other things (see 
Nature 500, 13-14; 2013). But such a laser 
would have limitations: its strongly peaked 
light pulses would destroy delicate materials. 
Ultimate storage rings, by contrast, satisfy a 
need for more gradually peaked pulses of light. 

Researchers say that these storage rings 
could revolutionize X-ray imaging by making 
it possible to map evolving chemical processes. 
Current X-ray sources are not bright enough 
to track changes in materials with nanometre 
and nanosecond resolution, because there 
are not enough coordinated photons in the 
beams. Ultimate storage rings would change 
that. “A whole class of new problems opens 
up,’ says Paul Evans, a materials scientist at 
the University of Wisconsin-Madison. For 
example, he says that the rings could be used 
to investigate what happens chemically and 
electrically at the interface between materials 
inside a battery as it runs out. 

The APS is seeking to tack the installa- 
tion of ultimate-storage-ring technology on 
to a separate upgrade that had already been 
approved. Cost calculations are still ongoing, 
but Stephenson hopes that the multi-bend 
achromats can be included without raising the 
upgrade budget much above US$391 million. 
MAX IV is implementing the technology for 
only 340 million Swedish kronor ($52 million), 
but that ring is smaller and the price tag would 
not include the overhead costs that are charged 
at US energy-department labs. 

After its upgrade, the APS could surpass 
MAX IV by approaching the theoretical limit 
for the most focused beam possible. The 
Swedish synchrotron will contain 20 multi-bend 
achromats, whereas the APS upgrade calls for 
around 40. In 2012, physicists at SLAC National 
Accelerator Laboratory in Menlo Park, Cali- 
fornia, showed that the number of multi-bend 
achromats around a larger ring could be pushed 
even higher without fundamentally destabiliz- 
ing the electron beam. “The key is to make the 
bending gentle,” says Yunhai Cai, head of beam 
physics at SLAC. 


Alongside APS, the European Synchrotron 
Radiation Facility (ESRF) in Grenoble, France, 
has also opted for a multi-bend-achromat 
upgrade, after a working group concluded last 
October that the technology was affordable. 
ESRF director-general Francesco Sette says that 
accelerator physicists there showed that multi- 
bend achromats could work with the facility’s 
existing injector, a part of the machine that 
supplies extra electrons to the main ring a few 
times each day. He had previously thought that a 
new injector would be needed. “We are today in 
full swing to launch as soon as possible,” he says. 

Storage rings in Brazil and Japan will also be 
upgraded with multi-bend achromats, giving 
MAX IV a window of only one year from its 
projected completion date of 2015 before it faces 
competition (see ‘Focused beams’). 

Some have suggested that particle-physics 
tunnels, too, could eventually be turned into 
light sources with multi-bend achromats. 
SLAC has an idle 2.2-kilometre-circumference 
tunnel that originally housed a particle acceler- 
ator used to compare the decay rates of matter 
and antimatter. And a 6.3-kilometre tunnel 
used by the now-closed Tevatron particle 
accelerator at Fermilab near Batavia, Illinois, 
is another candidate for conversion. Eriksson 
says that building ultimate storage rings of that 
size would not be realistic for Sweden, given 
the relative size of its science budget. 

He knows that Sweden's time in the vanguard 
will be short-lived, and has mixed feelings 
about seeing other countries adopting the tech- 
nology that he and his colleagues pioneered so 
enthusiastically. “We are both happy and a little 
sorry, he says. = 


CORRECTION 

The News story ‘NASA ponders Kepler’s 
future’ (Nature 501, 16-17; 2013) inflated 
the size of asteroids that the probe could 
watch for — they would be several hundred 
metres in diameter rather than several 
hundred kilometres. 
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THE SPY WHO LOVED FROGS 


To track the fate of threatened species, a young scientist must 
follow the jungle path of a herpetologist who led a secret double life. 


efore leaving for the Philippines as an 

undergraduate in 1992, Rafe Brown 

scoured his supervisor's bookshelf to 

learn as much as he could about the 
creatures he might encounter. He flipped 
through a photocopy of a 1922 monograph 
by the prolific herpetologist Edward Taylor, 
and became mesmerized by a particular liz- 
ard, Ptychozoon intermedium, the Philippine 
parachute gecko. With marbled skin, webs 
between its toes and aerodynamic flaps along 
its body that allow it to glide down from the 
treetops, it was just about the strangest animal 
that Brown had ever seen. 

Brown learned that Taylor had collected the 
first known example, or type specimen, near 
the town of Bunawan in 1912, and had depos- 
ited it at the Philippine Bureau of Science in 


BY BRENDAN BORRELL 


Manila. But the specimen had been destroyed 
along with the building during the Second 
World War, and the species had never been 
documented again in that part of the country. 
“What are the chances I’m going to see one of 
the rarest geckos in the world?” he wondered. 

He was driven by more than curiosity. Given 
the rampant deforestation in that part of the 
Philippines, he wanted to determine whether 
the species still existed there and if so, how simi- 
lar it was to geckos collected in other areas. He 
wanted to see, in other words, whether Taylor's 
70-year-old taxonomic decisions were still valid. 

On their first night in the field, Brown and 
his colleagues drove to the edge of the forest 
and caught two red eyes in the beam of a 
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headlamp. It was a Ptychozoon. Back at their 
hotel, Brown photographed the gecko, took 
tissue samples for DNA sequencing, and care- 
fully prepped it and stuck it in a jar. It became 
the neotype to replace Taylor’s lost specimen, 
and in 1997, Brown published a new descrip- 
tion of the species’. It marked the start of an 
obsession. 

As Brown made his career studying bio- 
diversity in the Philippines over the next two 
decades, he could not escape Taylor’s long 
shadow. The elder herpetologist had logged 
23 years in the field over his lifetime, collect- 
ing more than 75,000 specimens around the 
world, and naming hundreds of new species. 

There is a darker side to Taylor's legacy, 
however. He was a racist curmudgeon beset by 
paranoia — possibly a result of his mysterious 
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double life as a spy for the US government. 
He had amassed no shortage of enemies by 
the time he died in 1978. An obituary noted 
that he was, to many, “a veritable ogre—and 
woe to anyone who incurred his wrath”’. 
More damaging, perhaps, were the attacks 
on his scientific reputation. After the loss of 
his collection in the Philippines, many of the 
species he had named were declared invalid 
or duplicates. The standards of taxonomy had 
advanced beyond Taylor's quaint descriptions, 
and without the specimens to refer to, his evi- 
dence seemed flimsy. 

Nevertheless, Brown felt a connection with 
his maligned predecessor. It was a bond that 
intensified when, in 2005, Brown became cura- 
tor of herpetology at the University of Kansas 
Natural History Museum in Lawrence, the same 
institution at which Taylor had spent much of 
his career. Over the years, Brown has rebuilt 
some of Taylor’s collection and resurrected 
many of his species. Now, as he finishes a major 
monograph on a group of Philippine frogs, he 
is more convinced than ever: “Taylor was right.” 

Brown's reassessment could prove crucial. 
Since Taylor’s time, taxonomy has become 
more than just a naming exercise. Designating 
a group of organisms as a new species, or 
lumping it in with an old one, can affect the 


Edward Taylor 


ie Taylor’s Red Cross ID, 
»: used in Siberia in 1918. 


The Philippine 


animals’ legal protection and influence the 
allocation of scarce conservation resources. 
Amphibian declines, in particular, have made 
headlines around the world, and the Philip- 
pines ranks second only to Sri Lanka for sheer 
proportions of imperilled species: 79% of Phil- 
ippine amphibians are found nowhere else on 
Earth, and 46% are under threat of extinction. 
But following Taylor's trail has given Brown 


"HE WAS A VERITABLE 
OGRE — AND WOE TO 
ANYONE WHO INCURRED 
HIS WRATH.” 


cause for optimism. “A lot of the things people 
thought were extinct,’ he says, “if you go right 
where Taylor said to go, you can find them.” 


ALUST FOR ADVENTURE 

On the fourth floor of the Kansas museum, 
Brown is walking through the herpetological 
collections. Lizards float upside down in yel- 
low-tinged alcohol. Snakes coil like corkscrews, 


parachute gecko. 
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The Philippine Bureau of Science in Manila 
was destroyed in the Second World War. 
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and two dozen tiny, dark frogs embrace in a 
specimen jar. On one shelf, the jars have red 
ribbons tied around their lids to signify that 
their contents are type specimens: the stand- 
ards on which species descriptions are based. 

When scientists disagree on whether some- 
thing is a new species or a variant of a known 
one, they often need to refer back to the type 
specimen or even return to where it was col- 
lected. Brown opens a jar and extracts a small 
lizard that has a tin tag tied to its waist with 
twine. It is one of Taylor's originals, on loan 
from the California Academy of Sciences 
in San Francisco. “Preserved properly, well 
labelled and deposited in a safe institution,” 
says Brown, “these will last forever.” 

That is the kind of legacy to which every 
taxonomist aspires, and Taylor was no excep- 
tion. Born in Maysville, Missouri, on 23 April 
1889, he was still a teenager when he began 
depositing specimens at this museum. At 23, 
he joined the civil service and became what he 
called “a one-man Peace Corps” in the Phil- 
ippines — then a US territory — setting up a 
school for members of a headhunting tribe 
in central Mindanao, where he collected the 
parachute gecko among other species. Next, he 
worked for the fisheries department in Manila 
and then completed his PhD on Philippine 
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mammals, but his true passion was always her- 
petology. It came at the expense of just about 
everything else in his life. “I named about 
500 species,’ he would later tell a reporter, “but 
I can't always remember the names of my own 
children.’ His wife, Hazel, could not bear his 
long absences, and they divorced in 1925. 

By then, Taylor had described more species 
than most of his peers could achieve in a life- 
time: 42 amphibians, 40 lizards and 30 snakes. 
He sold some of his specimens to museums in 
the United States, but many remained at the 
Bureau of Science in Manila, where he thought 
they would be secure forever. He joined the 
faculty at Kansas in 1926, and over the next 
two decades he wandered the globe from Mex- 
ico and Costa Rica to parts of Africa, lugging 
a folding army cot and subsisting on rice and 
evaporated milk as he collected specimens. 

In his 60s, however, Taylor found himself 
under attack. In 1954, Robert Inger, a herpe- 
tologist at the Field Museum in Chicago, IIli- 
nois, published a withering taxonomic review 
of Philippine amphibians’. Inger, who studied 
only specimens in museums, axed 44 of the 87 
species that Taylor had personally named or 
approved. “The differences between Taylor’s 
frogs will be recognized as the differences to 
be expected between individuals,’ Inger wrote. 
In other words, Taylor was a hack. On his per- 
sonal copy of Inger’s text, Taylor scribbled the 
word, “Hooey.” 

More recently, herpetologists 
have levelled other serious allega- 
tions against Taylor's character. In 
1993, the Kansas Herpetological 
Society posthumously published 
his 1916 master’s thesis on Kansas 
reptiles. In a foreword, one of his 
former students, Hobart Smith, revealed that 
Taylor had plagiarized large sections from the 
nineteenth-century palaeontologist and her- 
petologist Edward Drinker Cope. For those 
who knew Taylor as a man of principle, it was 
a devastating revelation, but it also explained 
why Taylor had never tried to publish the work 
himself. Then, in 2002, herpetologist Jay Sav- 
age at the University of Miami in Coral Gables, 
Florida, charged that Taylor had secretly cop- 
ied the field notes of a rival in order to scoop 
him on his next collecting trip to Costa Rica’, 

Taylor had other demons. He had voiced 
support for eugenics programmes and report- 
edly refused to take on Jewish students. Brown 
makes no apologies for the man, but Taylor's 
reputation — for good or ill — is intertwined 
with the history of the Kansas museum. “In the 
end, we consider him our own,’ says Brown. 


ALEGACY REVISITED 

Brown’s interest in Taylor grew when he was 
a graduate student at the University of Texas 
at Austin in the late 1990s. He devoured Tay- 
lor’s monographs to plan his own collecting. 
He hunted through museum records to find 
out where Taylor's specimens were, and made 


visits to see them at the Field Museum and the 
California academy. But time and time again, 
he came to a dead end when he wanted infor- 
mation on type specimens that Taylor had 
deposited at the Philippine Bureau of Science. 

He soon learned the tragic story of that insti- 
tution: in February 1945, when US General 
Douglas MacArthur launched an all-out attack 
on Manila to expel the Japanese occupiers, the 
Bureau of Science was reduced to rubble, and 
all of its botanical and zoological specimens 
were destroyed, including 32 of Taylor’s type 
specimens. “The loss is an irreplaceable one,” 
Taylor's friend Elmer Merrill, a legendary bota- 
nist, wrote in Science®. Plant specimens were 
gradually replenished, but no one had system- 
atically tried to replicate Taylor’s efforts. For 
many years, hostile tribes kept most interlopers 
away from species-rich regions. In the 1990s, 
threats of terrorism made it difficult to access 
places such as the Sulu Archipelago, where 
Taylor collected types for a dozen species. 
Despite the danger, Brown resolved to retrace 
Taylor's steps. 

In July 1998, he hired a boy to guide his 
team through the mountains of northern 
Luzon Island. It was the same place where Tay- 
lor had been ambushed by a machete-wielding 
native in a loincloth. While Brown tromped 
through streams on his quest, a rumour spread 
through a town below that Westerners had 


"ALOT OF THE THINGS PEOPLE THOUGHT 
WERE EXTINCT, IF YOU GO RIGHT WHERE 
TAYLOR SAID, YOU CAN FIND THEM.” 


kidnapped the boy. A dozen locals took up 
torches, canes and machetes and marched to 
the home of the village chief on their way to 
find the kidnappers. When Brown returned, 
he diffused the situation by producing sacks of 
amphibians — his only captives. Taylor, when 
ambushed, had produced a rifle. 

During the 1998 trip, Brown and his collab- 
orators found 5 species of reptiles and amphib- 
ians not seen for many decades; 13 potentially 
new to science; and 30 never before reported 
from the region®. One night, Brown caught 
several Platymantis frogs making an insect- 
like chirp high in the trees. They turned out to 
be from a species that Taylor had caught and 
named rivularis in 1920. The type specimen 
still existed, but it was bleached of colour and 
in pretty bad shape, and there were not many 
other examples to examine. Accordingly, Inger 
had lumped rivularis in with another species, 
hazelae (named after Taylor's wife). But after 
hearing its mating calls and seeing its col- 
ours in life, Brown decided he would resur- 
rect P. rivularis as its own species. Inger, says 
Brown, favoured more inclusive groupings and 
was draconian in his decisions. “Ifhe had any 
doubt, he would sink a species.” 
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Over the past two decades, Brown and his 
close collaborator Arvin Diesmos, a herpe- 
tologist at the National Museum of the Phil- 
ippines in Manila, have collected more than 
15,000 Philippine specimens — about one- 
fifth of Taylor’s lifetime haul. To establish 
evolutionary relationships, Brown also col- 
lects DNA, which cannot be extracted from 
Taylor’s formaldehyde-preserved specimens, 
and he records frog mating calls, a key tool for 
identifying species. By the time he has finished 
his own review of the Philippine Platymantis 
frogs, which has been in the works since 2003, 
he expects to have doubled the number of 
species from 30 to 60, resurrecting many of 
Taylor's names. 


CROAK AND DAGGER 

Brown’s fascination with Taylor has gone 
beyond taxonomy. Early in his research, he 
became intrigued by “herpetology gossip” 
about Taylor’s extracurricular activities. As 
he trotted around the globe, Taylor seemed 
to be conducting field work in conflict zones 
and, in his memoirs, he alluded to duties out- 
side science’. While working for the fisheries 
department in Manila, he helped to investi- 
gate the murder of an Englishman, traded tips 
with the Swedish secret service and scouted for 
mercury that could be used in munitions dur- 
ing the First World War. On his river journeys, 
he occasionally noticed Japanese 
people, and warned the local gov- 
ernor that they were “spying out 
of the land”. 

It was never clear to Taylor’s 
few confidantes whether he used 
wars as an excuse to get into the 
field, or vice versa. In his obituary, 
a former student suggested that Taylor’s later 
activities during the Second World War “prob- 
ably will never be known in detail”’. 

But the true nature of Taylor’s work is finally 
coming into focus as intelligence records are 
declassified and research materials surface. 
They reveal that Taylor was indeed a spy, and 
that he continued to do intelligence work after 
the First World War, when he was sent to Sibe- 
ria. His official purpose was to join the Red 
Cross to stop a typhus epidemic, but he was 
also gathering information on the Communist 
revolt in Russia and, later, the fate of grand 
duchess Anastasia, daughter of murdered tsar 
Nicholas II. 

Taylor was called to duty again in 1944, 
when he was 54 and war raged in the Pacific. 
According to records in the US National 
Archives, he joined the Office of Strategic 
Services (OSS), a precursor to the Central 
Intelligence Agency (CIA), to train agents in 
Sri Lanka — then a British territory that pro- 
vided ready access to Myanmar, Malaysia, 
Indonesia and other areas that the Japanese 
had infiltrated. Scientific work, an OSS officer 
explained to one of Taylor’s superiors, was 
“excellent cover”. 
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Taylor taught jungle survival at Camp Y, 
a steamy settlement on the coast. With a 
penetrating stare and a lantern jaw, he seemed 
more imposing than his 1.8 metres. In his 
spare time, he occasionally dodged gunfire 
to nab specimens, which he studied for two 
monographs published after the war. “Have 
just described five new forms of blind snakes 
from the island,’ he wrote to S. Dillon Ripley, a 
young ornithologist who served with him and 
would later lead the Smithsonian Institution 
in Washington DC. In a later letter, he offered 
“some 500 species” of mollusc shells to the 
Smithsonian. 

After the war, Taylor helped the British in 
Malaysia to investigate Japanese war crimes 
against civilians. His work documenting rape, 
torture and murder may have contributed to 
his antipathy towards the Japanese people. 
Never an easy-going person, his experiences 
at war seem to have wounded him. He failed in 
a bid to become head of the Kansas museum, 
and grew increasingly paranoid in daily 
life. He studied Russian and made inquiries 
about working for the CIA. Smith, who died 
in March this year, told Nature that Taylor 
sprinkled flour on the floor of his office to 
detect trespassers during his absences. “I was 
wary of him,’ said Smith. William Duellman, a 
herpetologist at the University of Kansas who 
first met Taylor in 1951, thinks that Taylor's 
symptoms could today meet the standards 
of post-traumatic stress disorder. Neverthe- 
less, Taylor kept working. In his later years, 
he studied a group of poorly known, legless 
amphibians called caecilians. He published 
a sprawling, 800-page taxonomic review’ of 
them in 1968. 


ON THE BRINK 

Taylor’s herpetological legacy in the Philip- 
pines has taken on new importance now that 
the country has lost more than 95% of its native 
forest. Species collectors such as Brown know 
that their work has conservation implications, 
but there are often differences between scien- 
tific studies and conservation classifications. In 
the late 1990s, for example, the International 
Union for Conservation of Nature (IUCN) 
labelled the Polillo Island frog — Platymantis 
polillensis, first described by Taylor — as 
critically endangered. All but 4 square kilo- 
metres of Polillo’s forests had been razed for 
coconut plantations. 

But in 2004, Brown was listening to his 
recordings when he noticed that the Polillo 
frog had a mating call similar to that of a frog 
that he had collected on Luzon. Brown applied 
for permission to get genetic samples from 
Taylor’s original collecting site, and confirmed 
his hunch: the frog is widespread. Last year, he 
reported’ that seven frog species once consid- 
ered vulnerable or endangered by the IUCN 
are actually widespread on Luzon. 

The challenge for taxonomists is that 
although many agree that global biodiversity 
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is in crisis, threat levels are hard to gauge 
accurately because advocates for every taxon 
and ecosystem are clamouring for attention 
and real data are scarce. “Global threat assess- 
ments for large taxonomic groups is a very 
inexact science,’ says Walter Jetz, an ecologist 
at Yale University in New Haven, Connecticut. 
“We need more boots on the ground.” 

Brown is sceptical about conservation 
assessments in general, but one threat to Phil- 
ippine amphibians does concern him: the 
chytrid fungus Batrachochytrium dendroba- 
tidis, which has been linked to the decline or 
extinction of hundreds of amphibian species 
around the world (see Nature 465, 680-681; 
2010). In 2009, Brown identified the fungus 
on five species in the Philippines, and it has 
since been found on more. The chytrid threat, 
he says, combined with habitat destruction 
and climate change, could push Philippine 
amphibians over the edge. 

Time is running out to document the 
biodiversity of the Philippines, but also to 
determine Taylor’s place in history. Brown 
has found that Taylor’s species descriptions, 
although brief, often zeroed in on the pre- 
cise trait that set one group apart from its 
relatives. “He had a sharp eye,” says Brown. 
More than a dozen species whose names were 
erased by Inger and others have proved to be 
valid after all. 

Inger, who is 93, is impressed by the emerg- 
ing evidence and the way that Brown has 
approached the subject. 
“I think he’s probably 
right,” he says, but adds, 
“Tm still a little uneasy 
about over-fragmenta- 
tion” 
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Back at the University of Kansas, Brown 
takes a seat inside an archival library and 
dips once more into some of Taylor’s work, 
including the battered leather books that 
the man used for his field notes and speci- 
men catalogues. Paging through one of those 
catalogues for the first time, Brown is stunned 
to find that Taylor had crossed out the name 
attached to an Asian spadefoot toad that he 
caught on Mindoro Island — a strange, gan- 
gly creature that crawls rather than hops. 
Next to it, Taylor had written, “new sp!!”. As 
recently as 2009, Brown had designated it as 
a new species, Leptobrachium mangyanorum, 
because it was so different from previously 
described relatives”. 

“Ed was way ahead of us,” says Brown. “Why 
he never named it, we'll never know. But it’s 
pretty satisfying to come along 90 to 100 years 
later and arrive at the same conclusion.” m 


Brendan Borrell is a biologist-turned- 
journalist based in New York. He contributed 
to a 2007 review paper on gliding animals, 
which also included work by Brown. 
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PHYSICISTS HAVE 
SPENT A CENTURY 
PUZZLING OVER 

THE PARADOXES OF 
QUANTUM THEORY. 
NOW A FEW OF THEM 
ARE TRYING TO 
REINVENTIT. 


BY PHILIP BALL 
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f the truth be told, few physicists have ever really felt comfortable with quantum 

theory. Having lived with it now for more than a century, they have managed 

to forge a good working relationship; physicists now routinely use the math- 

ematics of quantum behaviour to make stunningly accurate calculations about 

molecular structure, high-energy particle collisions, semiconductor behaviour, 
spectral emissions and much more. 

But the interactions tend to be strictly formal. As soon as researchers try to get 
behind the mask and ask what the mathematics mean, they run straight into a seem- 
ingly impenetrable wall of paradoxes. Can something really bea particle and a wave 
at the same time? Is Schrédinger’s cat really both alive and dead? Is it true that even 
the gentlest conceivable measurement can somehow have an effect on particles half- 
way across the Universe? 

Many physicists respond to this inner weirdness by retreating into the ‘Copen- 
hagen interpretation’ articulated by Niels Bohr, Werner Heisenberg and their col- 
leagues as they were putting quantum theory into its modern form in the 1920s. 
The interpretation says that the weirdness reflects fundamental limits on what can 
be known about the world, and just has to be accepted as the way things are — or, as 
famously phrased by physicist David Mermin of Cornell University in Ithaca, New 
York, “shut up and calculate!”" 

But there have always been some who are not content to shut up — who are 
determined to get behind the mask and fathom quantum theory’s meaning. “What 
is it about this world that forces us to navigate it with the help of such an abstract 
entity?” wonders physicist Maximilian Schlosshauer of the University of Portland 
in Oregon, referring to the uncertainty principle; the wave function that describes 
the probability of finding a system in various states; and all the other mathematical 
paraphernalia found in textbooks on quantum theory. 

Over the past decade or so, a small community of these questioners have begun 
to argue that the only way forward is to demolish the abstract entity and start again. 
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They are a diverse bunch, each with a different idea of how such a ‘quan- 
tum reconstruction’ should proceed. But they share a conviction that 
physicists have spent the past century looking at quantum theory from 
the wrong angle, making its shadow odd, spiky and hard to decode. 
If they could only find the right perspective, they believe, all would 
become clear, and long-standing mysteries such as the quantum nature 
of gravity might resolve themselves in some natural, obvious way — per- 
haps as an aspect of some generalized theory of probability. 

“The very best quantum-foundational effort,’ says Christopher Fuchs 
of the Perimeter Institute for Theoretical Physics in Waterloo, Canada, 
“will be the one that can write a story — literally a story, all in plain 
words — so compelling and so masterful in its imagery that the math- 
ematics of quantum mechanics in all its exact technical detail will fall 
out as a matter of course”. 


AVERY REASONABLE PROPOSAL 

One of the earliest attempts to tell such a story came in 2001, when 
Lucien Hardy, then at the University of Oxford, UK, proposed that 
quantum theory might be derived from a small set of “very reasonable” 
axioms about how probabilities can be measured in any system’, such 
as a coin tossed into the air. 

Hardy began by noting that a classical system can be specified 
completely by measuring a certain number of ‘pure’ states, which he 
denoted N. For a coin toss, in which the result is always either heads or 
tails, N equals two. For the roll of a dice, whereby the cube must end up 
with one of six faces uppermost, N equals six. 

Probability works differently in the quantum world, however. 
Measuring the spin of an electron, for example, can distinguish two 


pure states, which can be crudely pictured as a rotation clockwise or 
anticlockwise around, say, a vertical axis. But, unlike in the classical 
world, the electron’s spin is a mixture of the two quantum states before 
a measurement is made, and that mixture varies along a continuum. 
Hardy accounted for that through a ‘continuity axiom, which demands 
that pure states transform from one to another in a smooth way. This 
axiom turns out to imply that at least N’ measurements are required to 
completely specify a system — a relationship that corresponds to the 
standard quantum picture. 

But, in principle, said Hardy, the continuity axiom also allows for 
higher-order theories in which a complete definition of the system 
requires N 3, N‘ or more measurements’, resulting in subtle deviations 
from standard quantum behaviour that might be observable in the lab. 
He did not attempt to analyse such possibilities in any detail, however; his 
larger goal was to show how quantum physics might be reframed as a gen- 
eral theory of probability. Conceivably, he says, such a theory could have 
been derived by nineteenth-century mathematicians without any knowl- 
edge of the empirical motivations that led Max Planck and Albert Einstein 
to initiate quantum mechanics at the start of the twentieth century. 

Fuchs, for one, found Hardy’s paper electrifying. “It hit me over the 
head like a hammer and has shaped my thinking ever since,” he says, 
convincing him to pursue the probability approach wholeheartedly. 

Fuchs was especially eager to reinterpret the troubling concept of entan- 
glement: a situation in which the quantum states of two or more particles 
are interdependent, meaning that a measurement of one of them will 
instantaneously allow the measurer to determine the state of the other. 
For example, two photons emitted from an atomic nucleus in opposite 
directions might be entangled so that one is polarized horizontally and the 
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other is polarized vertically. Before any measurement is made, the polari- 
zations of the photons are correlated but not fixed. Once a measurement 
on one photon is made, however, the other also becomes instantaneously 
determined — even ifitis already light years away. 

As Einstein and his co-workers pointed out in 1935, such an instanta- 
neous action over arbitrarily large distances seems to violate the theory 
of relativity, which holds that nothing can travel faster than light. They 
argued that this paradox was proof that quantum theory was incomplete. 

But the other pioneers stood fast. According to Erwin Schrédinger, 
who coined the term ‘entanglement; this feature is the essential trait of 
quantum mechanics, “the one that enforces its entire departure from 
classical lines of thought”. Subsequent analysis has resolved the paradox, 
by showing that measurements of an entangled system cannot actually 
be used to transmit information faster than light. And experiments on 
photons in the 1980s showed that entanglements really do work this way. 

Still, this does seem an odd way for the Universe to behave. And 
this is what prompted Fuchs to call for a fresh approach to quantum 
foundations*. He rejected the idea, held by many in the field, that wave 
functions, entanglement and all the rest represent something real out in 
the world (see Nature 485, 157-158; 2012). Instead, extending a line of 
argument that dates back to the Copenhagen interpretation, he insisted 
that these mathematical constructs are just a way to quantify “observers? 
personal information, expectations, degrees of belief ae 

He is encouraged in this view by the work of his Perimeter Institute 
colleague Robert Spekkens, who carried out a thought experiment asking 
what physics would look like ifnature somehow limited what any observer 
could know about a system by imposing a “knowledge balance principle”: 
no observer's information about the system, as measured in bits, can ever 
exceed the amount of information he or she lacks. Spekkens calculations 
show that this principle, arbitrary as it seems, is sufficient to reproduce 
many of the characteristics of quantum theory, including entanglement’. 
Other kinds of restriction on what can be known about a suite of states 
have also been shown to produce quantum-like behaviours”. 


KNOWLEDGE GAP 

The lesson, says Fuchs, isn’t that Spekkens’s model is realistic — it was 
never meant to be — but that entanglement and all the other strange 
phenomena of quantum theory are not a completely new form of physics. 
They could just as easily arise from a theory of knowledge and its limits. 

To get a better sense of how, Fuchs has rewritten standard quantum 
theory into a form that closely resembles a branch of classical probability 
theory known as Bayesian inference, which has its roots in the eighteenth 
century. In the Bayesian view, probabilities aren't intrinsic quantities 
‘attached’ to objects. Rather, they quantify an observer's personal degree 
of belief of what might happen to the object. Fuchs’ quantum Bayesian 
view, or QBism (pronounced ‘cubism’)*"®, is a framework that allows 
known quantum phenomena to be recovered from new axioms that do 
not require mathematical constructs such as wavefunctions. QBism is 
already motivating experimental proposals, he says. Such experiments 
might reveal, for example, new, deep structures within quantum mechan- 
ics that would allow quantum probability laws to be re-expressed as 
minor variations of standard probability theory". 

“That new view, if it proves valid, could change our understanding 
of how to build quantum computers and other quantum-information 
kits,” he says, noting that all such applications are critically dependent 
on the behaviour of quantum probability. 

Knowledge — which is typically measured in terms of how many bits 
of information an observer has about a system — is the focus of many 
other approaches to reconstruction, too. As physicists Caslav Brukner 
and Anton Zeilinger of the University of Vienna put it, “quantum phys- 
ics is an elementary theory of information”’*. Meanwhile, physicist 
Marcin Pawlowski at the University of Gdansk in Poland and his col- 
leagues are exploring a principle they call ‘information causality’. This 
postulate says that if one experimenter (call her Alice) sends m bits of 
information about her data to another observer (Bob), then Bob can 
gain no more than m classical bits of information about that data — no 
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matter how much he may know about Alice’s experiment. 

Pawtowski and his colleagues have found that this postulate is 
respected by classical physics and by standard quantum mechanics, 
but not by alternative theories that allow for stronger forms of entan- 
glement-like correlations between information-carrying particles. For 
that reason, the group writes in their paper, “information causality 


might be one of the foundational 
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probabilistic theories. This allows us to focus on the question of what 


mathematical possibilities. “It 
turns out that many principles 
makes quantum theory unique.” 


lead to a whole class of probabil- 
istic theories, and not specifically 
quantum theory,” says Schloss- 


POISED FOR SUCCESS? 

Hardy says that the pace of quantum-reconstruction efforts has really 
picked up during the past few years as investigators begin to sense they 
are getting some good handles on the issue. “We're now poised for some 
really significant breakthroughs,’ he says. 

But how can anyone judge the success of these efforts? Hardy notes 
that some investigators are looking for experimental signs of the higher- 
order quantum correlations allowed in his theory. “However, I would say 
that the real criterion for success is more theoretical,” he says. “Do we 
have a better understanding of quantum theory, and do the axioms give 
us new ideas as to how to go beyond current-day physics?” He is hopeful 
that some of these principles might eventually assist in the development 
ofa theory of quantum gravity. 

There is plenty of room for scepticism. “Reconstructing quantum 
theory from a set of basic principles seems like an idea with the odds 
greatly against it, says Daniel Greenberger, a physicist who works on 
quantum foundations at the City College of New York’. Yet Schlosshauer 
argues that “even ifno single reconstruction program can actually find a 
universally accepted set of principles that works, it’s not a wasted effort, 
because we will have learned so much along the way”. 

He is cautiously optimistic. “Once we have a set of simple and physically 
intuitive principles, and a convincing story to go with them, quantum 
mechanics will looka lot less mysterious’, he says. “I think a lot of the out- 
standing questions will then go away. I’m probably not the only one who 
would love to be around to witness the discovery of these principles.” m 


Philip Ball is a freelance writer based in London. 
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A single market for 
European research 


European collaboration is not far behind that in the United States, but there is still 
work to be done on cross-border funding and financial inequalities, says Paul Boyle. 


uilding the European Research 
B= (ERA) has been the priority of 

Maire Geoghegan-Quinn since her 
appointment as the European Commissioner 
for Research, Innovation and Science in 2009. 
Her mandate stipulates that the ERA should 
“ensure the free circulation of researchers, 
knowledge, ideas and technology” across 
the European Union (EU; see go.nature. 
com/qdlyri) — like an academic equivalent 
of the European single market for goods 
and services. In March 2012, the European 
Council called for it to be completed by 2014 
(see go.nature.com/meukwn). One focus, of 


particular relevance to the national funding 
agencies, has been on increasing researcher 
collaboration and mobility within Europe. 
As president of Science Europe (represent- 
ing 53 research-funding and -performing 
organizations from 27 countries) and chief 
executive of the UK Economic and Social 
Research Council (ESRC), I believe that 
the proposed timetable for implementing 
the ERA is worryingly short, particularly if 
changes to funding agencies and other insti- 
tutional practices are necessary. I am also 
concerned that the changes may produce 
some undesirable, unintended consequences. 


To agree on how much collaboration is 
optimal, we need to know how research- 
ers collaborate and move within Europe. At 
Science Europe’s request, such an analysis, 
produced by scientific publisher Elsevier's 
SciVal Analytics Team, will be released this 
week’. The report shows that internal Euro- 
pean collaboration is not far behind that in 
the United States, but that connections with 
countries outside Europe need nurturing too. 
Researcher mobility between institutions in 
different European countries is also relatively 
low compared with movement between US 
states. To further the ERA vision, we 
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> should promote best practices for cross- 
border funding and minimize inequalities in 
salaries, pensions and benefits. 

In Europe, most scientific research is 
supported by national agencies, rather 
than through European or international 
programmes. Different agencies take dif- 
ferent approaches to collaboration. Some 
allocate resources across borders: the ESRC, 
for example, allows up to 30% of any grant to 
be spent on international collaborations. By 
contrast, many agencies restrict their spend- 
ing to their home nations, and legislation to 
alter this requirement would be impossible to 
attain by 2014, even if they wished to do so. 

Some organizations pay proportionately 
for researchers from their country in trans- 
national projects. One approach is that col- 
laborators submit a proposal to a single ‘lead 
agency, which reviews the bid and takes the 
funding decision. Agencies from other coun- 
tries recognize the decision and pay their 
share. Although this is efficient, negating 
the need for peer review in multiple nations, 
problems can arise. Committing high pro- 
portions of an agency’s budget on the basis 
of decisions taken in another nation could 
lead to local funding panels feeling deprived 
of power. Lead-agency agreements work best 
when those involved have similar peer-review 
standards, success rates and views on research 
priorities. Agreements cannot be imposed on 
unwilling partners. 


STATE OF THE UNION 
Comparisons are often made with the greater 
level of partnership across US states (see 
go.nature.com/w87clf). But, unlike Europe, 
the US states are part ofa federal system, with 
one language, consistent labour-market con- 
ditions and a single national funding system. 
To provide a benchmark, Science Europe 
asked Elsevier to draw on publication data 


AFFILIATION TRENDS 


ee MOBILITY 
@ United States (1996-2011) 
Stayed within Mobile within Mobile beyond 
Europe/US 


country/US state Europe/US 


60 o 


from their bibliographic database Scopus 
on the level of collaboration and mobility in 
Europe and the United States. The report’ 
will be launched in Brussels on 16 September. 
The analysis found that, in 2011, 13% of 
papers with a European primary author 
included co-authors from more than one 
country in Europe, compared with 16% of 
US primary-author papers that involved 
inter-state collaborations (see ‘Affiliation 
trends’). This differ- 


“Few ence is small. But it is 
academics intercontinental col- 
seem to move laborations — those 
between that involve authors 
countries in outside Europe or the 
Europe.” United States, respec- 


tively — that tend to 
produce the most highly cited papers. So it 
is concerning that European scientists have 
fewer such partnerships (23% of joint papers) 
than their US counterparts (30%). 

Relatively few academics seem to move 
between countries in Europe. The report 
finds that between 1996 and 2011, only 
7% of researchers’ affiliations switched 
between European countries, according to 
their addresses. In the United States, 22% of 
researchers published from institutions in 
more than one state in the same period. Bar- 
riers to movement across European borders 
might include language, benefits systems and 
cultural differences. 

Although many funding agencies cannot 
legally allocate their resources to another 
nation, some have agreed a ‘money follows 
researcher’ policy, allowing researchers to 
transfer grants if they relocate within Europe. 
But even this simple scheme has challenges. 

Who owns the intellectual property from 
the grant-funded work when a researcher 
moves? Why would an academic move to a 
country where salaries or pensions are lower? 


Researcher mobility and collaboration, revealed though publication data, are lower across Europe 
han across the United States, in part owing to language and cultural barriers. 


COLLABORATION 
(2007-2011) 
Within Within Outside 


country/state 


Europe/US Europe/US 


*Percentages exclude transitory mobility and single-author/-institute papers. Europe refers to the 41 countries eligible for 
Seventh Framework Programme funding: 27 EU member states and 14 associated countries. 
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How long would a national agency support 
such a policy if several researchers and their 
grants emigrated but few immigrated? And 
would the strongest researchers concentrate 
in the best-performing countries and institu- 
tions, perhaps increasing European excel- 
lence overall, but punishing nations that are 
still building their science portfolios? 

Progress on various fronts is required’. 
First, regular monitoring of researcher col- 
laborations and movements in Europe is 
needed. Funding agencies, universities and 
the EU should work together to collect dif- 
ferent parts of this information. 

Second, best practices, drawn from the 
various schemes, should be adopted across 
Europe. These should be administratively 
simple and avoid double peer review, where 
possible. Clarity is required on terminology 
— even words such as ‘grant’ or ‘evaluation’ 
can mean different things in different organi- 
zations — and on how funding mechanisms 
are implemented and communicated to 
researchers. Science Europe’s working group 
on cross-border collaboration is now prepar- 
ing such guidance. 

Third, we need a forum that brings together 
research-funding and -performing organiza- 
tions, the European Commission and min- 
isterial representatives from member states. 
Science Europe has committed to hosting 
such a high-level ERA workshop annually. 

Fourth, European funding agencies 
should not overemphasize European col- 
laboration at the expense of global partner- 
ships. Best-with-best partnerships should be 
encouraged, wherever they might be. 

Fifth, research-funding and -performing 
organizations, through Science Europe, and 
universities must work with the European 
Commission to identify and solve barriers 
to mobility in the labour market, welfare and 
administrative systems. These include issues 
such as pensions portability, coordination 
of social-security systems and transparent 
recognition of educational qualifications. 

The ERA should be an evolving, flexible 
and creative space in which researchers, ideas 
and knowledge circulate freely to respond to 
society's challenges. At its heart will be trust. 
The establishment of Science Europe is itself 
testament to the willingness of European 
research agencies to engage in shaping a better 
research landscape. m 


Paul Boyle is chief executive of the UK 
Economic and Social Research Council 
in Swindon, UK, and president of Science 
Europe in Brussels. 

e-mail: esrc.ceo@esrc.ac.uk 
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The purported decline in bee numbers raises questions about evidence quality. 


A standard for 
policy-relevant science 


Ian Boyd calls for an auditing process to help policy-makers to navigate research bias. 


he increasing concern about unrelia- 

bility in scientific literature’” is a 

problem for people like me — Iam 
the science adviser to DEFRA, the UK 
government department for environment, 
food and rural affairs. To counsel politicians, 
I must recognize systematic bias in research. 
Bias is cryptic enough in individual studies, 
let alone in whole bodies of literature that 
contain important inaccuracies”. 

It worries me that because of bias, some 
parts of the published scientific literature, 
such as studies on the safety of genetically 
modified (GM) organisms and pesticides, or 
trends in biodiversity measurements, might 
have only limited use in policy-making. 

To mitigate this problem, policy-makers 
should consider holding published scientific 
evidence to an audited standard that can be 
replicated and is robust to variations in asses- 
sor competence. A weighting factor, or ‘kite 
mark; applied to journals or individual arti- 
cles, could help policy-makers to assess the 
robustness of studies for use in particular 
applications. Similar methods established by 
non-profit standards associations are used in 
research to certify laboratory practice and in 


engineering to certify building standards. 

The quality of research results fluctuates 
because of varying tractability in the prob- 
lems being probed’. For example, it is easy 
to judge the efficacy of an experiment to engi- 
neer a tomato to produce the pigment antho- 
cyanin’, because if it succeeds, that tomato is 
the colour of a ripe plum. It is much harder to 
judge the reliability of a study investigating 
whether a GM crop is toxic to animals®. The 
latter situation is much more susceptible to 
inaccuracy and interpretation. 

These problems are amplified in complex 
issues such as the environmental effects 
of GM organisms or chemical pollutants, 
including pesticides and endocrine disrupt- 
ers. In these cases, experimentation is needed 
at scales large enough to provide statistical 
power in the presence of high background 
noise. The problem is amplified further when 
statistical inference is used. 


SCOPING THE PROBLEM 

Systematic bias across whole fields of science 
is even more cryptic and therefore more prob- 
lematic. It could stem from the combined 
effects of how science is commissioned, 


conducted, reported and used, and also 
from how scientists themselves are incentiv- 
ized to conduct certain research’. Such bias 
results from actively searching for a particular 
outcome, rather than performing balanced 
hypothesis testing. For example, in 2006, 
researchers in the United Kingdom and in the 
Netherlands found that the number of insect 
pollinators might have declined®. A conse- 
quent call for proposals (see go.nature.com/ 
audhny) contained the underlying assump- 
tion that there was a decline, rather than con- 
veying a need to establish whether current 
information about declines was robust. 
Another problem is the tendency to treat 
different studies as statistically independent, 
even when they have emerged from con- 
nected commissioning processes and could 
therefore amount to multiple testing of the 
same hypothesis, meaning that every extra 
study must overcome an increasingly rigor- 
ous statistical hurdle to demonstrate efficacy. 
In combination, these kinds of bias can make 
individual or groups of studies that report cer- 
tain effects seem more important than they 
really are. I suspect that these effects could 
be a factor in the continuing controversies 
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surrounding genetic modification and the 
failure of the EU regulatory system to process 
applications to license new GM products. 

A common reaction to such controversy 
is to commission subject reviews or meta- 
analyses’ that assess the weight of evidence for 
certain effects across many individual studies. 
Ideally, reviewers would use processes similar 
to those deployed in the Cochrane Reviews 
that inform decision-making in health care’. 

But reviews also contain pitfalls. First, they 
risk amplifying rather than eliminating sys- 
tematic bias — which could be more common 
in some subjects than others. Second, they can 
be affected by the increasing tendency not to 
publish ‘negative’ results*. Meta-analyses can 
compound the prevalence of false positives in 
the literature, and can be blind to unreported 
true negatives. We need rules for how to deal 
with these issues when compiling literature 
reviews for policy-relevant research. 


SEAL OF APPROVAL 

Strict procedures govern experimental design 
and the evidence standards for trials that are 
used to determine the efficacy and safety of 
GM organisms, pesticides or drug therapies. 
But once products are licensed for use, they 
are often subject to less formal investigations. 
The same relaxation of rules applies to testing 
the efficacy of policy interventions. Ad hoc 
studies, with all the problems outlined above, 
can then carry disproportionate political 


weight when their results question the opera- 
tional integrity of a licensed product, or the 
effectiveness of a policy’®. Quality-control 
criteria are needed for these studies that are 
outside a regulatory framework. 

We need an international audited stand- 
ard that grades studies, or perhaps journals. 
It would evaluate how research was com- 
missioned, designed, conducted and 
reported. This audit procedure would assess 

many of the funda- 


“What I mental components 
propose of scientific studies, 
augments such as appropriate 
rather than statistical power; pre- 
replaces peer cision and accuracy of 
review.” measurements; and 


validation data for 
assays and models. It would also consider 
conflicts of interest, actual or implied, and 
more challenging issues about the extent to 
which the conclusions follow from the data. 
Any research paper or journal that does not 
present all the information needed for audit 
would automatically attract a low grade. 
Such a system would provide policy 
officials and others with a reliable way of 
assessing evidence quality, and it would 
drive up standards in scientific research to 
reverse the worrying trends that suggest 
underlying bias’ *”. 
Critics will counter that my proposed cert- 
ification standard would be subjective and 


would shift the job of assessing quality away 
from expert peer reviewers. But in its current 
form, peer review fails to set a consistent 
standard. What I propose augments rather 
than replaces peer review, and assessment 
could be carried out on behalf of authors, 
journals or users of information through the 
use of third-party certified auditors. 

I do not underestimate the challenge of 
establishing such a system, but it would 
bring standards to scientific publishing that 
are common practice in other disciplines. 
Ultimately, this will increase the rigour and 
transparency around the scientific literature 
that is used in policy decisions. m 


Ian Boyd is chief scientific adviser at the UK 
Department of Environment, Food and Rural 
Affairs in London. He is also professor in 
biology at the University of St Andrews, UK. 
e-mail: ian.boyd@defra.gsi.gov.uk 
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Bring on the evidence 


It is time to probe whether the trend for patient and public involvement in medical 
research is beneficial, say Sophie Petit-Zeman and Louise Locock. 


partners in medical research — from 
deciding what to study to influencing 
how results are used — is an emerging force. 
For some, the approach is based on common 
sense and justice’. Others, such as the chief 
medical officer for England, Sally Davies, 
feel that the advice of patients and the pub- 
lic “invariably makes studies more effective, 
more credible and often more cost efficient”. 
The Seventh Framework Programme 
(FP7), the European Union's current research- 
funding instrument, stresses’ the importance 
of patient and public involvement, known as 
PPI. And the Patient-Centered Outcomes 
Research Institute in Washington DC has 
allocated US$68 million to a research network 
predicated on the principle that “the inter- 
ests of patients will be central to decision- 
making”(see go.nature.com/mdhy6i). 
PPI is a prerequisite for much UK 


[eee patients and the public as 


government research funding and it is 
spreading among funders, health-care 
organizations and charities’. The James Lind 
Alliance (JLA), with which one of us (S.P-Z.) 
has worked since its inception in 2004, ena- 
bles patients, carers and clinicians to agree 
on what research matters most. It explicitly 
excludes the pharmaceutical industry and 
pure researchers. After a decade of arms- 
length government support, the JLA is now 
part of the National Institute for Health 
Research (NIHR) based in Southampton, 
UK, and JLA partnerships are complete or 
underway for 25 medical conditions (see 
go.nature.com/twhvxz). For example, the 
NIHR Oxford Biomedical Research Centre 
is running partnerships in spinal-cord injury 
and joint-replacement surgery, and it is the 
first major research institution to be appoint- 
ing staff to use the JLA method ‘in house’, 
closing the loop between what matters to 
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patients and what is researched in their name. 

This international growth of PPI is rightly 
paralleled by unease at the paucity of evid- 
ence for its impact. And the evidence there 
is, including the findings that PPI improves 
recruitment to studies and changes what is 
researched’”, is weak. As Simon Denegri, 
the United Kingdoms first national director 
for public participation and engagement in 
research, put it: “The evidence-base for PPI’s 
impact is meagre, patchy and largely obser- 
vational.” 


SELF-EXAMINATION 

Those of us working in PPI must robustly 
examine our own practices with a common 
set of tools. Otherwise, we will struggle to 
answer PPI sceptics, such as one researcher 
who asked: “Why should patients have 
useful opinions about what directions 
research should take?”®. 


ILLUSTRATION BY CHRIS RYAN/NATURE 


A first crucial step is to ensure consistent, 
accurate reporting of what PPI has been 
done and how. We can assess whether an 
activity is useful only if it is clear what it was. 

This challenge is being addressed through 
GRIPP’ (Guidance for Reporting Involve- 
ment of Patients and Public), a checklist 
published in 2011 for studies that include 
PPI to help authors and readers to critically 
appraise the work. GRIPP is being used to 
generate consensus through the EQUATOR 
Network, an international initiative that 
promotes the development and spread of 
guidelines for health-research reporting*. 

A key element of reporting PPI is to make 
clear who was involved, in part to allow us 
to gauge when it matters to distinguish 
between public and patient input. One 
demonstrable effect of PPI is that it helps to 
create user-friendly information, question- 
naires and interview schedules for patients’. 
But this sort of reality check about jargon 
differs from gathering and heeding patient 
experiences. 

We must also probe whether PPI is valu- 
able for all research types. Will it ever, for 
example, have a place in basic science? Anec- 
dotal evidence suggests that it might, in part 
because patients push for research into 
causes. The UK Alzheimer’s Society, the only 
funder that works with people with dementia 
and their carers to select research projects, 
backs work from the lab bench to the clinic. 
And priorities in sight-loss research, iden- 
tified through the JLA approach, revealed 
patient interest in causation, as well as treat- 
ments using stem cells and gene therapy. 

One of the knottiest problems in PPI 
is how to best weigh up anecdotes and 
evidence. How are the patients involved 
chosen? Do they bring more than their own 
views? Are diverse voices heard, or just those 
that are loudest? 

Ignore such questions, and PPI might 
unwittingly perpetuate power imbal- 
ances. Patients can achieve involvement 
through existing networks, but not all will 
be part of these, or they might be chosen 
by a researcher who is keen to work with 
a kindred spirit. The most well-meaning 
approaches can simply extend input from 
educated, middle-class professionals to input 
from educated, middle-class patients. 

Yet we must also avoid double standards. 
Just as people will always want the best 
researchers or clinicians, we must not exclude 
the most informed or articulate patients’. 

There is no easy fix, but the ability of 
involved patients to represent wider views 
can be optimized through routes such as the 
website www.healthtalkonline.org, led by 
the University of Oxford’s Health Experi- 
ences Research Group (HERG), where one 
of us (L.L.) is deputy research director. 

Healthtalkonline and its sister site, 
www.youthhealthtalk.org, contain video, 


COMMENT 


audio and written records of nearly 
3,000 people’s experiences of more than 
75 health-related issues. The websites allow 
patients and professionals to broaden their 
knowledge of what it is like to be ill or to 
make difficult health-care decisions. 

Using qualitative research, interviews 
with subjects continue until no major new 
themes emerge, indicating that a compre- 
hensive set of views has been gathered. As 
Sue Ziebland, HERG’s research director, 
explains: “Supporting patients involved in 
research to draw from a pool of views helps 
defend them from accusations that they 
bring only their own agenda” 


BUILDING A CASE 
Gathering the evidence base for PPI will take 
time. The methodological issues described 
here must be addressed, and the crucial ques- 
tion — whether research using PPI makes life 
better for patients — is complex. A project 
funded by the UK Medical Research Council 
last week launched its Public Involvement 
Impact Assessment Framework, a resource 
to support research teams to develop impact- 
assessment tools appropriate for their work. 
As PPI matures, we must find ways to 
ensure that those who do it, be they profes- 
sionals, patients or public participants, are 
offered support and training — perhaps 
most crucially to help them to understand 
each others’ worlds. We must then report, 


dissect and assess involvement, devising 
impact measures with patients as partners, 
in ways that optimize the potential of 
patient-centred science. m 
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| COMMENT | BOOKS & ARTS 


Stephen Hawking, the subject of both a new memoir and a documentary film. 


A cosmological life 


Robert P. Crease weighs up two takes on Stephen 
Hawking — including the theoretical physicist’s own. 


rue to its name, Stephen Hawking’s 

| autobiography My Brief History is a 

model of brevity, at just 20,000 words. 

A new documentary about the renowned 

theoretical physicist, Hawking, takes longer 

to watch than the book does to read. These 

separate projects add little to our under- 

standing of Hawking, but do feed our insa- 
tiable curiosity about him. 

Both vehicles do a creditable job of review- 
ing the outlines of Hawking’s story, which is 
a compelling one. In 1963, while a graduate 
student at the University of Cambridge, UK, 
Hawking learned that he had motor neuron 
disease (known in the United States as Lou 
Gehrig's disease). In 1970, he began work on 
the theory of black holes, predicting in 1974 
that they emit radiation and could therefore 
potentially evaporate: ‘Hawking radiation’ 
is probably his most significant scientific 
contribution. Gradually losing mobility and 
speech, he survived thanks to an indomi- 
table spirit, devoted assistants and increas- 
ingly sophisticated technology. (He wrote the 
memoir using a sensor attached to his glasses 
that responds to cheek-muscle twitches.) 
Hawking’s A Brief History of Time (Bantam, 
1988) is surely the most popular science book 
ever. And he has become, to the public, the 
greatest living scientist — as well as a media 


My Brief History 
STEPHEN HAWKING 
Bantam: 2013. 


Hawking 
DIRECTED BY STEPHEN FINNIGAN 
Cambridge Film Festival, UK. 19 September 2013. 


magnet who enjoys making outrageous 
claims (“Philosophy is dead”). 

Meanwhile, Hawking has a solid, if not 
superhuman, reputation among scientists. In 
1999, a Physics World survey asked eminent 
physicists to name the five physicists they 
thought had made the most significant con- 
tributions. Of the 61 named — 11 of them 
living — Hawking received only one vote. 

My Brief History does not do what we 
expect of a memoir. It does not take the 
reader behind any scenes. Hawking nar- 
rates his life non-introspectively, celebrat- 
ing its triumphs and burnishing its sensitive 
moments. It is a concise, gleaming portrait, 
not unlike those issued by the public rela- 
tions department of an institution. “Not 
knowing what was going to happen to me 
or how rapidly the disease would progress, 
I was at a loose end,” he writes of his reac- 
tion to being diagnosed. Nor is he reflective 
about his relationships with his ex-wives 
Jane Wilde and Elaine Mason. “My marriage 
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to Elaine was passionate and tempestuous,’ 
he writes. “We had our ups and downs.” 

The book provides no revelations, deep 
insight, messy details or score-settling, and 
does not explore his celebrity status. Hints 
of emotion are rare. At one point he recalls 
thinking himself a tragic figure, and begins 
to listen to Wagner. At another he remem- 
bers seeing a boy in the hospital bed next 
to him die of leukaemia. “Whenever I feel 
inclined to be sorry for myself, I remember 
that boy.” You sense something like a soul 
behind Hawking’s dispassionate account, 
but, like a black hole, its existence has to be 
deduced from external indications. 

Hawking, directed by Stephen Finnigan, 
is a good celebrity biopic. In it, Hawking 
narrates many of the same events as in My 
Brief History, often in the same words. But 
the film is diverting in a way that the book is 
not: we get to see clips of Hawking’s appear- 
ances as himself or in cartoon form on US 
television shows such as Star Trek and The 
Simpsons. We hear Wagner and Pink Floyd 
on the soundtrack, watch re-enactments of 
episodes from Hawking’s early life, and see a 
snapshot of Hawking with President Obama 
and First Lady Michelle. 

Hawking’s synthesized voice can lend an 
aura of gravitas to words that would come 
off as platitudes on the page. “When you are 
faced with the possibility of an early death, it 
makes you realize that life is worth living’, he 
says. Occasionally, even the synthesizer can- 
not rescue his pronouncements from sound- 
ing naive: “Sometimes I wonder if I am as 
famous for my wheelchair and disabilities as 
Iam for my discoveries.” The film concludes 
with spectacular footage from the 2012 
Paralympics in London, at which Hawking 
opened the ceremonies. 

These two offerings add to a growing list 
that do offer new knowledge about the cos- 
mologist. Jane Hawking’s Travelling to Infin- 
ity (Alma Books, revised, 2008) divulges 
aspects of their relationship. Kitty Fergu- 
sons Stephen Hawking: An Unfettered Mind 
(Palgrave Macmillan, 2012) explores his life 
and illness. Héléne Mialet's Hawking Incor- 
porated (University of Chicago Press, 2012) 
analyses the networks of people, media and 
technologies without which there would be 
no Stephen Hawking as we know him. 

Mialet demolishes the popular myth of 
Hawking as a solitary genius. But it is an 
irresistible one. “Thank you for coming ona 
journey through my world,’ Hawking says at 
the conclusion of the documentary. In truth, 
Hawking is a carefully stage-managed tour 
of only part of that world, yet it is a skilful, 
entertaining and moving trip nonetheless. m 


Robert P. Crease is professor of philosophy 
at Stony Brook University, New York, and 
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Evolution of a mind 


Eugenie C. Scott revels in the first volume of Richard Dawkins’s frank new memoir. 


ichard Dawkins is one of the world’s 
Roesticoin scientists — largely 

because of his tireless promotion 
of evolution and, more recently, atheism, 
through a succession of best-sellers, videos 
and public appearances. In An Appetite 
for Wonder, the first part of a two-volume 
memoir, Dawkins presents his life up to the 
publication of The Selfish Gene (Oxford 
University Press, 1976), the popular- 
science blockbuster that kick-started his 
career as a spokesperson for evolution- 
ary biology. 

As befits someone whose life’s work 
has focused on the cumulative changes 
through time that we call evolution, 
Dawkins wants to share the forces and 
factors in his life that have shaped the 
person he has become. A bookish child, 
he neglected many opportunities pro- 
vided by his nature-loving parents to 
explore and be inspired by the natural 
world. Given that he was born in Nairobi 
and spent his early years in what is now 
Malawi, it seems curious that the natural 
abundance around him did not kindle 
that spark. He does speak of developing 
a fondness for animals, but more through 
reading children’s books such as Hugh 
Lofting’s Doctor Dolittle series, noting, 
“T learned late to love watching wild 
creatures, and I have never been such an 
outdoor personas either my father or my 
grandfather.” 

The African interlude came about 
because Dawkins’s father, Clinton John 
Dawkins, followed family tradition by 
working in the British foreign service. 
Dawkins presents his father, who was 
a forester, as independent, resourceful, 
inventive and willing to take big risks. When 
a distant relative unexpectedly bequeathed 
to John Dawkins the country estate Over 
Norton Park in Oxfordshire, UK, he moved 
the family back to England, renounced his 
government pension and turned the land 
into a working farm. Richard, then eight, was 
soon packed off to boarding school. 

The picture Dawkins paints of his schools 
is, if not quite Dickensian, pretty awful: little 
boys lining up naked for cold morning baths, 
shivering in unheated rooms, choking down 

bad-tasting food, 


> NATURE.COM enduring corporal pun- 
Foracelebrationof  ishment and bullying, 
The Selfish Gene, and not always receiv- 
see: ing a sterling educa- 
go.nature.com/383s7y tion. Dawkins was, he 


confesses, an indifferent student, but a few 
teachers managed to stimulate an interest in 
science. It is clear that he did not come into 
his own intellectually until he was admitted 
— by the skin of his teeth — into the Uni- 
versity of Oxford. There, at Balliol College, 
Dawkins was lucky enough to be tutored 
by the great ethologist Nikolaas ‘Niko’ Tin- 


Richard Dawkins studying mating calls in crickets in 1976. 


bergen, and became fascinated by animal 
behaviour. 

The latter part of the book traces Dawk- 
ins’s intellectual path from Oxford to his 
assistant professorship in zoology at the 
University of California, Berkeley (where 
he was swept up in the movement against 
the Vietnam War), and his return to Oxford 
as a young scientist. It describes his transi- 
tion from a quantitatively oriented com- 
puter-programming experimentalist who 
modelled chick behaviour to the more theo- 
retically oriented scientist we know today. 
I doubt if more than 1 in 10,000 readers of 
The Selfish Gene ever read his earlier experi- 
mental work. (I hadn't, either.) Inspired by 
the 1970s ferment over group and individ- 
ual selection, kin selection and the like, that 
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book put the young An Appetite for 

Dawkins on that Wonder: The 

short list of scientists Making ofa 
Scientist 


who are able to make 

complicated scientific 

ideas understandable 

and exciting without oversimplifying them. 

Dawkins is a polarizing figure, both 
widely praised and widely criticized. 
Supporters as well as detractors may 
be surprised at the honest depiction 
of an individual who comes across as 
both less saintly and less diabolical than 
media caricatures may have led them 
to expect. He loves poetry, and readily 
confesses to choking up when reading 
sentimental verse such as Hilaire Belloc’s 
1910 To the Balliol Men Still in Africa. 
He reflects with chagrin on his lack of 
concern about the rampant bullying that 
took place at his schools, and his support 
of what he now considers the bullying of 
some Berkeley faculty members by radi- 
cal professors and students during the 
Vietnam War protests. Of the Berkeley 
experience, he notes, “I was still young, 
but not all that young. Should have 
known better.” He readily confesses that 
scientific habits were slow in coming. It 
is avery honest book. 

Charles Darwin wrote his autobi- 
ography for his family; for whom is 
Dawkins’s written? I found the enumera- 
tion of ancestors in early chapters a bit 
of a slog. The middle chapters describ- 
ing his childhood in Africa and school 
days in England should be of interest 
to all. The latter chapters, dealing with 
his mathematical modelling of evolu- 
tion, might appeal more to scientists. 

Dawkins’ atheist fans are not all necessarily 
in that camp, and might find the graphs and 
diagrams of behaviour patterns daunting. 

However, throughout and as usual, Dawk- 
ins’s writing is graceful, sparkling with anec- 
dotes and wit. Those of us who struggle in 
our writing will be comforted to read the 
words: “Pretty much every sentence I write 
is revised, fiddled with, re-ordered, crossed 
out and re-worked.” In other words, the 
elegant, functional design of his writing is 
accomplished through variation and selec- 
tion. Why am I not surprised? m 


RICHARD DAWKINS 
Bantam: 2013. 
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Researchers working at the Laser Interferometer Gravitational-Wave Observatory. 


SOCIOLOGY OF SCIENCE 


Chasing the 
gravitational wave 


Marianne de Laet enjoys a sociological analysis of how 
a select group of physicists works. 


sociologist Harry Collins documents 

the astrophysical search for the elusive 
gravitational wave. In part an account of 
sociological fieldwork among scientists in 
the field and part astronomy-history mys- 
tery, Collins's bookis a terrific read informed 
by almost 40 years of research. 

The book homes in on two sudden energy 
surges, thought to be the first truly significant 
detections of gravitational-wave signals, that 
got the astrophysics community in a stir: the 


IE Gravity’s Ghost and Big Dog, British 


Equinox Event in 2007 and Big Dog in 2010. 
The book's first part was previously pub- 
lished in 2010 as Gravity’ Ghost (University 
of Chicago Press), a stand-alone account of 
the Equinox Event. The new, second part 
documents the frenzy over Big Dog — so 
named because the signal was bigger than 
the 2007 surge and seemed to come from the 
constellation Canis Major — when the stakes 
were higher and the future of the field hung 
in the balance. As an ‘embedded’ observer, 
participant, apprentice and analyst working 
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among the astrophysicists, Collins reports 
on meetings, telephone conferences, e-mail 
discussions and social events. By conversing 
with and analysing scientists at research facili- 
ties, scientific conferences and home institu- 
tions, he offers a glimpse of the ways in which 
knowledge is made in this esoteric field. 

Predicted by Albert Einstein's general the- 
ory of relativity, first published in 1915, gravi- 
tational waves are formed when the mass of 
an imploding star disappears, causing ripples 
in space time. Ground-based interferometers 
built to detect them have been in use for dec- 
ades, but Collins focuses on the Laser Inter- 
ferometer Gravitational- Wave Observatory 
(LIGO), which has sites at Hanford, Wash- 
ington, and Livingston, Louisiana. 

Two questions drive the narrative: are the 
signals real? And do they prove that gravita- 
tional waves exist? Collins lets us in on the 
conversations among the scientists. At times, 
they focus on the rigour of statistical mar- 
gins by asking at which margin error can be 
ruled out; at others, they discuss the seman- 
tics of knowing, pondering whether they 
are confronted with observation, evidence 
or fact. He intersperses these exchanges 
with historical information and his own 
sociological analysis of gravitational-wave 
research. As the astrophysicists debate the 
two detection events, it emerges that decid- 
ing whether a gravitational wave exists is 
not just a matter of observation, calibration 
and learning to understand LIGO’s idiosyn- 
crasies. It also involves semantics, political 
considerations and attending to expectations 
from funders, media and the public. 

A debate among key collaborators about 
how to announce Big Dog is a case in point. 
The degree of certainty with which a finding 
is announced sets expectations, and these find 
their way into the rationales for retooling or 
building new instruments. The validity of past 
research, the promise embedded in future 
instruments and scientific reputations are all 
explicitly at stake in the exchanges among the 
collaborators. Collins has turned the minutiae 
of these conversations into an exciting detec- 
tive story. 

Collins finds himself 
immersed, learning so 
much science that he 
becomes a force to be 
reckoned with. His 


queries are pertinent to 

the research and, by his 

own account, that gets 
thescientiststorethink |. ty’s Ghost 
strategies orinterpreta- and Big Dog: 
tions of their findings. Scientific 


Collins even tries his 
hand at physics, and 
suggests an alternative 


Discovery and 
Social Analysis in 
the Twenty-first 


Century 
strategy for some Of apey COLLINS 
the calculations for Big yy, iversity of Chicago 
Dog. Flirting with the press: 2013. 


LIGO LABORATORY 


science, he begins to think not only about 
the astrophysicists, but with them. With 
Collins, we begin to wonder what it takes 
to be an expert in the field. 

In social science, going native in this 
way is a tricky business. In an effort not to 
alienate his subjects, the sociologist may 
end up relying too much on the group’s 
own interpretations of their actions, lead- 
ing to less insightful renderings of their 
world. If, by contrast, he lacks credibility 
with the group, his presumption to reveal 
something new may offend. Collins walks 
a tight line deftly; although his interlocu- 
tors are somewhat incredulous at his 
“Collins efforts to do phys 


ics, the group does 
learnssomuch — engage with them. 


science that That does not 
he becomes make Collins an 
aforce to be expert within 
reckoned their culture — 
with.” only about it. And 


here, in trying to 
justify his calculations by asserting that 
they are not far off, he violates a golden 
rule of the sociology of science. Under- 
standing the process of knowledge- 
making is not predicated on whether the 
knowledge in question is right or wrong. 
At the same time, Collins defies a best 
practice of anthropology — to examine 
one’s motifs and motivations. He is not 
the first researcher in the sociology of 
science whose observation of scientific 
expertise turns into the desire to possess 
it. This interesting feature of the analy- 
sis of science remains unexamined in 
Collins’s book. 

That said, Collins's respect for science 
compels him to make a lovely observa- 
tion: that there is something admirable 
about the modes of science — its aspi- 
ration to honesty, persistence and truth- 
seeking — and that this should serve as 
a moral and epistemological model for 
how to behave. And if, as US anthropol- 
ogist Stefan Helmreich suggests in his 
book Silicon Second Nature (UC Press, 
1998), social analysis of science enables 
participants to “recognize something 
new of themselves’, Collins’s serial mono- 
graph is on point. It tells of scientists who 
are well aware of their own practices — 
perhaps not least as a result of having a 
sociologist in their midst. = 


Marianne de Laet is an associate 
professor of anthropology and science, 
technology, and society at Harvey Mudd 
College in Claremont, California, USA. 
She observed the emerging collaboration 
of the California Extremely Large 
Telescope, renamed in 2003 as the Thirty 
Meter Telescope. 

e-mail: delaet@g.hmc.edu 
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Mapping memory’s lanes 


Alison Abbott sees the science and poetry ina 
penetrating study of reminiscence in the elderly. 


coined the term nostalgia to describe the 

clinical symptoms of homesickness. Hofer 
linked the Greek words nostos, for homecom- 
ing, and algia, for pain. Nostalgia could even 
be life-threatening, he noted. In The Nostal- 
gia Factory, psychologist Douwe Draaisma 
touches on this phenomenon in his explora- 
tion of memory and ageing, which draws on 
scientific work and anecdotal case studies. 

He tells of the Dutch emigrants after the 
Second World War who, crippled by home- 
sickness, later found that return offered 
no respite: ‘home’ no longer matched their 
memories of it. He describes the twentieth- 
century emigration agencies as ‘nostalgia fac- 
tories. But the real nostalgia factory, he says, 
is time — which “makes emigrants of us all”. 
In old age, memories are all that remain of the 
land of our youth, even if we never left home. 
His beautifully written book attempts to cap- 
ture the nebulous essence of reminiscence in 
eight elegant, authoritative essays. 

Draaismas style is both literary and scien- 
tific. It calls to mind the works of Oliver Sacks, 
who popularized neuroscience through the 
intriguing stories of his neurological patients. 
(In fact, one of Draaisma’s essays is based on 
a conversation with Sacks about “what time 
does to memories and what memories do 
to time”) But Draaisma’s style is perhaps the 
more poetic, which is what makes the power- 
ful insights in his book so penetrating. 

It is disconcerting to learn from Draaisma 
how unstable and adrift our own biographies 
are: we constantly reconstruct our lives as 
our memories are refiltered through new 
experiences. Draaisma offers both reassur- 
ance and warnings. Forgetting names rarely 
foretells dementia, and many old people who 
describe themselves as forgetful have a well- 
functioning memory when tested objectively. 
But claims that food supplements, enriched 
environments or computer training programs 
can halt the natural process of harmless age- 
related forgetfulness are hokum. “Anyone 
who thinks that such tricks ... can actually give 
them a better memory probably also thinks 
they would be able to walk better if they used 
a walking frame,’ notes Draaisma. 

For many, The Nostalgia Factory will be 
what Draaisma refers to as a ‘decisive book, 
one that changes one’s perspective on life. 
Or maybe that would be true only for the 
young? Because in this book we also learn 


IE 1688, Swiss physician Johannes Hofer 


= about the ‘reminis- 


OSTA ri cence-curve bump. 
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The Nostalgia ers then date these 
Factory: Viemory, to the year they hap- 
Time and Ageing —_ pened. The number of 


DOUWE DRAAISMA 
Yale University Press: 
2013. 


memories recalled for 
each age, beginning 
at age three or four, 
follows a predictable 
pattern. Numbers rise to a peak at around 
20 years old, then fall rapidly, flattening out 
at an alarmingly low level well before middle 
age. The curve rises again in later life, when 
people recall things simply because they hap- 
pened recently. Middle age barely registers. 
So it is not surprising that most people, if 
asked to name a decisive book, cite one they 
read before they were 23 — a time they will 
also describe, unprompted, as ‘their era; when 
pop music was best. Its reflection can be seen 
in almost all autobiographies. In Peeling the 
Onion (Harville Secker, 2007), Nobel-prize- 
winning author Giinter Grass devotes the 
majority of pages on the first 30 years of his 
life to those between the ages of 17 and 21. 
Draaisma argues that the bump in the rem- 
iniscence curve has less to do with the ability 
of the young adult brain to store memories 
efficiently and more to do with the quality of 
memories accrued as we set out on independ- 
ent adult life. Indeed, the bump advanced by 
more than a decade ina study of people who 
had migrated in their mid-thirties. Sacks, 
who emigrated from England to the United 
States in the 1960s at the age of 27, muses in 
his conversation with Draaisma that when he 
recorded his own year-by-year reminiscences 
during long driving trips, his tapes for the 
1970s overflowed, but thereafter the length 
of the tape “decreased almost linearly”. 
Rigorous psychological research into the 
reminiscence phenomenon, whether healthy 
or distorted, is relatively new. This short 
book shows that it can reveal much about 
who we are at different stages of our lives. m 


Alison Abbott is Nature’ senior European 
correspondent. 
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Turkey’s scientists 
can’t ignore politics 


Given the severity of the political 
problems that scientists face 

in Turkey, academics need to 
become more politically involved 
— not less, as you suggest — to 
help bring about reform to the 
country’s academic system (see 
Nature 500, 253; 2013). 

There is no clear evidence to 
support your implication that the 
debate over the headscarf ban in 
Turkish universities has affected 
the quality of their scientific 
research. Other long-standing 
impediments are the real culprits. 

Scientists in Turkey have 
had to contend for years with a 
lack of academic freedom and 
transparency in grant-review and 
faculty-recruitment processes. 
The small size of the academic 
community and a clumsy 
bureaucracy further obstruct the 
nation’s research potential. 

There are some notable 
successes, however. Contrary 
to your description of Bogazigi 
University as “cash-starved’, in 
2010 researchers there gained 
funding of close to US$13 million, 
of which more than $2 million 
came from highly competitive 
international sources (see 
go.nature.com/wik8xn). 

Ozan Aygiin Massachusetts 
Institute of Technology, 
Cambridge, USA. 
oaygun@mit.edu 


Transcranial devices 
are not playthings 


Controlled investigation of 
transcranial direct-current 
stimulation (tDCS) for treating 
neuropsychiatric disorders or 
for neurorehabilitation should 
not be confused with improvised 
devices or practices that apply 
electricity to the brain without 
reference to established protocols 
(see Nature 498, 271-272; 
2013). Unorthodox technologies 
and applications must not be 
allowed to distort the long-term 
validation of tDCS. 
Experimentation outside 


established and tested norms 
may put subjects at risk. In tDCS, 
the delivered dose of electrical 
brain stimulation (defined by the 
waveform and intensity applied) 
and the electrode size, number 
and position are all crucial. Safe 
and effective dose ranges have 
been established in clinical trials. 
Patients receiving tDCS do so in 
a controlled environment, under 
guidance from institutional ethics 
review boards and with strict 
criteria for patient inclusion. 
Meddling with the tDCS dose 
is potentially as dangerous as 
tampering with a drug’s chemical 
composition. Painstaking efforts 
by researchers to understand 
the risks and benefits of tDCS 
should never be interpreted as 
encouraging such practices. 
Marom Bikson City College of 
New York, USA. 
bikson@ccny.cuny.edu 
Sven Bestmann University 
College London, UK. 
Dylan Edwards Burke-Cornell 
Medical Research Institute, New 
York, USA. 
M. B. declares competing financial 
interests: see go.nature.com/Ixuxfq 


for details. 


Don’t compromise 
on informed consent 


Iurge researchers responding 
to the call for information about 
sharing data sets (M. Bobrow 
Nature 500, 123; 2013) to defend 
the important principle of 
informed consent by patients 
participating in medical research. 
The UK Health and Social 
Care Act 2012 requires the 
transfer of all National Health 
Service electronic medical 
records in England from general 
practitioners to the Health and 
Social Care Information Centre. 
The biomedical research charity 
the Wellcome Trust and the 
Human Genomics Strategy 
Group have proposed that variant 
files, which contain whole human 
genomes minus the Sanger 
Institute reference genome, 
should be attached to the records. 
The UK government's 


intention is to share these data 
with accredited researchers 

— ranging from the Google- 
funded gene-testing company 
23andMe to private health-care 
companies and Chinese research 
institutes — but without people's 
knowledge or consent. 

Most data will be ‘pseudo- 
anonymized, with some 
identifiers (such as names) 
stripped out, but with the ability 
to link back to the individual 
retained. People can opt out of 
this data-sharing system (see 
go.nature.com/azkyru), but that 
will in effect prevent members 
of the public from taking part in 
medical research. 

Abandoning informed 
consent is unlikely to benefit 
biomedical research (J. P. A. 
Joannidis Am. J. Bioethics 13, 
40-42; 2013). Researchers 
could face a dwindling source 
of data as people withdraw from 
participation to protect their 
confidentiality. 

Helen Wallace Gene Watch UK, 
Buxton, UK. 
helen.wallace@genewatch.org 


HeLa genome versus 
donor’s genome 


I contend that the continual 
divergence of chromosomal 
features (‘karyotype’) and DNA 
sequence in dynamic cancer- 
cell populations undermines 
debate over ownership of the 
HeLa cancer-cell line derived 
from Henrietta Lacks six decades 
ago (see Nature 500, 121 and 
132-133; 2013). 

The HeLa genome is no longer 
Henrietta Lacks’s personal 
genome. Although the two 
share some DNA sequences, the 
similarity ends there. Lacks’s 
genome had the usual number 
of 46 normal chromosomes, 
whereas most HeLa cells have 
70-90 chromosomes and more 
than 20 translocations, some of 
which are highly complex. 

Changes in the HeLa genome 
in the past few decades have 
resulted from multiple cycles of 
genome reorganization during 


the cancer process and from the 
initial cell-culture experiments. 
Considering that chromosomes 
provide the genome identity 
and blueprint, it might even be 
argued that the HeLa genome is 
no longer a human genome. 
Henry H. Heng Wayne State 
University School of Medicine, 
Detroit, USA. 
hheng@med.wayne.edu 


A forgotten history 
of sex research 


Elizabeth Pollitzer’s point about 
sex mattering in all areas of 
biology has long been considered 
an important topic (see Nature 
500, 23-24; 2013). 

For example, the Endocrine 
Society's flagship journal 
Endocrinology, of which Iam 
editor-in-chief, has since 2012 
required authors to include the 
sex of animals, tissues and even 
cell lines in their papers (J. D. 
Blaustein Endocrinology 153, 
2539-2540; 2012). Authors 
must justify the use of single-sex 
animals or tissues, and consider 
sex in interpreting their data. 

Research in neurobiology and 
psychology has also consistently 
recognized sex differences in 
physiology, pathophysiology, 
pharmacology and toxicology 
(see, for example, G. E. Gillies 
& S. McArthur Pharmacol. Rev. 
62, 155-198; 2010). The US 
National Institutes of Health 
supports such research and has 
emphasized its importance in 
programme announcements. 

From 1922, the US Committee 
for Research in Problems of Sex 
was funded by the Rockefeller 
Foundation for decades. In 2006, 
the Organization for the Study 
of Sex Differences, based in 
Atlanta, Georgia, was founded 
for researchers and clinicians. 

Studies on sex differences 
published before the PubMed 
online archive, electronic 
journals and searchable keywords 
should not be overlooked. 
Andrea C. Gore University of 
Texas at Austin, USA. 
andrea. gore@austin.utexas.edu 
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OBITUARY 


Anthony James Pawson 


(1952-2013) 


Biochemist whose vision of cell signalling transformed cancer research. 


ony Pawson’s research on protein 
| interactions transformed the 
thinking about how cells comm- 
unicate, how proteins evolve and 
how cellular messaging goes awry in 
cancer. A creative experimenter, his 
synthesis of diverse observations in 
areas from biochemistry to mouse 
genetics and developmental biology 
led to a coherent picture of how cellular 
processes work. 

In the 1980s, early in his career, 
Pawson and his team discovered the 
Src homology region 2 (SH2). A sub- 
unit, or domain, of many proteins, 
SH2 directs how proteins interact and 
governs how cells respond to external 
cues. This finding set a path for all his 
future work. 

Pawson went on to show that combi- 
nations of a small number of domains 
could produce an enormous range 
of cellular responses. This ‘modular’ 
vision reshaped scientists’ under- 
standing of cellular regulation and 
paved the way for the development of 
drug classes that interfere with these 
protein interactions. 

But recognition did not come immedi- 
ately for Pawson. The existence of modular 
binding domains, now standard textbook 
fare, was initially received with scepticism 
by biochemists, and with benign neglect by 
molecular biologists. But as the evidence, 
largely from Pawson’s lab, grew more comp- 
elling, it could no longer be ignored. 

Pawson, who died suddenly at home on 
7 August aged 60, was born in Maidstone, 
UK, to an eminent British family. His father, 
to some the more famous Tony Pawson, 
was a champion sportsman and a house- 
hold name in Britain. A fly fisher, crick- 
eter, footballer and, later, sports writer, 
his multivalent skills loomed large for the 
younger Pawson, who was often mistaken 
for his father. It was his mother, Hilarie, 
a biology teacher, who stimulated his 
interest in science. 

Pawson read biochemistry at the Uni- 
versity of Cambridge, UK, and obtained his 
PhD in 1976, working with Alan Smith at the 
Imperial Cancer Research Fund (now Can- 
cer Research UK) on proteins encoded by the 
Rous sarcoma retrovirus. While visiting a 
friend in Cambridge, Tony met his American 
wife-to-be, Maggie. They married in 1975, 
and in 1976 moved to Berkeley, California, 


Se a, 


where Tony began postdoctoral work on the 
protein products of avian retroviruses. 

In 1981, the couple moved to Vancouver, 
Canada, where Pawson was assistant profes- 
sor in the department of microbiology at the 
University of British Columbia. Pawson’s lab 
became immediately productive, publishing 
important papers on oncoproteins — proteins 
coded by genes that have the potential to 
cause cancer. There, Pawson struck up collab- 
oration with Mike Smith, a Nobel-prizewin- 
ning chemist who invented the technique of 
site-specific mutagenesis that Pawson used in 
his SH2 discovery. 

In 1985, when a research institute was 
launched at the Mount Sinai Hospital in 
Toronto (now the Lunenfeld—Tanenbaum 
Research Institute), Pawson joined us as one 
of the first appointments in its molecular 
and developmental biology division. With 
the addition of developmental biologist 
Alexandra Joyner and the late molecular 
biologist Martin Breitman, the five of us 
— young, ambitious and with a pioneering 
spirit — knew that we were building some- 
thing important. 

The division was created at a propitious 
time: developmental biology was about to be 
transformed by the latest genetic technologies 
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from a descriptive to a mechanistic 
science, and cancer research was accel- 
erating with the discovery of oncogenes, 
tumour-suppressor genes and related 
signalling pathways. The two fields 
were about to converge with the discov- 
ery that the normal equivalents of viral 
oncogenes have crucial roles in embryo 
development. 

Within a few years, the division grew 
from having just a handful of students 
and postdocs to having more than 100 
members. Pawson was at the centre, 
partly because cell signalling was core 
to all our science, but largely because he 
loved to collaborate. To him, collabo- 
ration was as much about camaraderie 
and friendship as it was about getting 
a piece of science done. Working with 
Tony was fun, and although he was 
invited to give ten times the number 
of talks as everybody else, he would 
always give credit to his collaborators. 

Pawson’s seminars were virtuoso 
performances, and they were eagerly 
attended. His talks and more than 
450 published papers were not just 
assemblages of data, but elegantly 
presented expositions of how cells and 
organisms evolve, develop and function. At 
the time of his death, he was one of the most 
highly cited biomedical researchers. 

When Pawson received the Heineken 
Prize for biochemistry and biophysics in 
1998, he spoke at the ceremony in Amster- 
dam about the joy of discovery, the privilege 
of working with talented young people, the 
potential for advances to lead to new treat- 
ments for disease, and about the impor- 
tance of family in a scientist’s life. One 
could have heard a pin drop. 

Our strongest memories of Tony are in 
those early years at the Lunenfeld, sharing 
our latest results, writing grants, exchang- 
ing gossip and sharing family joys and 
sorrows — and watching Tony gesticulating 
wildly with his arms when he got excited. 
His enthusiasm was infectious. He will be 
greatly missed. = 


Alan Bernstein is president and chief 
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The mathematics of murder 


A mathematical model of gun ownership has been developed that clarifies the debate on gun control and tentatively suggests 
that firearms restrictions may reduce the homicide rate. 


ADELINE LO & JAMES H. FOWLER 


he mass killing of 20 children 
Te 6 adults on 14 December 

2012 at Sandy Hook Elemen- 
tary School in Newtown, Connecticut 
(Fig. 1), has revived an enduring con- 
troversy about gun control in the 
United States. Gun-control advocates 
believe that widespread gun owner- 
ship increases the rate of gun-related 
crime and homicide, whereas critics 
argue that gun availability actually 
decreases gun violence because poten- 
tial assailants are less likely to commit 
such crimes if they believe citizens are 
armed. But who is right? In an article 
published in PLoS ONE, Wodarz and 
Komarova' describe an elegant and 
highly parsimonious mathematical 
model designed to answer exactly this 
question. And, in an extremely cautious 
way, they suggest that more guns make 
things worse. 

The scientific literature suggests 
that gun homicides are influenced by 
various interwoven factors, including 
the rate of legal and illegal gun posses- 
sion, the national prevalence of armed 
and unarmed attacks, the likelihood of 
fatalities in such attacks, and the qual- 
ity and quantity of general law enforce- 
ment**. Gun-control policy is clearly only 
one key variable in a complex social system. 
But work on the issue is typically conducted 
by scholars who collect data on types of gun- 
control policy and numbers of gun homicides, 
and look for correlations while controlling 
for certain important variables — such as 
whether the locations of the homicides are 
rural or urban, or whether it is easy to obtain 
arms illegally. 

The problem is that many of these correla- 
tions are difficult to interpret. If gun deaths 
are higher in states with stricter gun-control 
laws, is this because gun restrictions cause 
higher crime or because politicians react to 
higher crime by enacting more restrictions? It 
is with this difficulty in mind that Wodarz and 
Komarova have created their formal model of 
gun ownership. 

The modelling of complex social 


Figure 1 | A prayer vigil for the victims of the Sandy Hook shooting. 


phenomena is not new. As early as 1974, the 
economist Gary Becker took a supply-and- 
demand approach to the ‘production of 
crime and punishment, using his model to 
show how crime might be minimized with 
various public and private policies that make 
criminal behaviours ‘costly’. But Becker’s 
analysis did not explicitly tackle the thorny 
problem of gun control, and the scholars 
who followed also tended to focus more on 
abstract models. 

By contrast, Wodarz and Komarova’s model 
is explicitly designed to address gun-control 
policies and their effects. At the core of the 
model is the rate of gun ownership. Strict laws 
might lower the rate and permissive laws might 
increase it, but the mechanism itself does not 
matter — the goal of the model is to clarify 
what assumptions are necessary to measure 
the effect of overall gun ownership on the 
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rate of firearms-related homicides. 

Wodarz and Komarova assume that 
there is a positive relationship between 
the number of gun owners and the 
number of potential gun-related attack- 
ers. This is reasonable: if there are no 
guns, there will be no attacks with guns. 
But the authors also assume that there is 
a negative relationship between the rate 
of gun ownership and the likelihood 
that a gun-wielding attacker actually 
uses his or her weapon. This is because 
non-criminals may own guns, too. If 
a potential victim possesses a gun, a 
potential attacker might think twice 
about attacking. 

A few other factors are also included 
in the model, such as the risk of dying 
in a gun attack, and the availability and 
take-up of illegal arms in the face of 
varying levels of gun control. But the 
key insight is that there are essentially 
two perfect worlds, one in which no 
one owns a gun (meaning no one is able 
to attack) and one in which everyone 
owns a gun (meaning no one is willing 
to attack). In between, we get the worst 
of both worlds because some criminals 
have guns and they choose to use them. 
This means that the effect of gun avail- 
ability is crucially dependent on where 
we sit between these two worlds. 

This is social science at its very best. Rather 
than crafting yet another highly abstract for- 
mal model, Wodarz and Komarova create a 
model that is directly relevant to an extremely 
important societal issue. And rather than 
overly emphasizing the results of their model, 
they conduct an exercise in caution, highlight- 
ing the importance of grounding models in 
sound and accurate assumptions — because a 
model is only as good as the assumptions from 
which it proceeds. 

Using their best guess about values esti- 
mated or implied from the existing literature, 
Wodarz and Komarova show that their model 
implies that stricter laws are the best way to 
reduce gun deaths. But they are quick to point 
out that there are some assumptions that are 
vital to the model for which we have no reliable 
measurement. So although the model does 
lend support to arguments in favour of gun 
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control, it is fair to say that the jury is still out. 

The most obvious contribution Wodarz and 
Komarova make towards the study of gun-con- 
trol policies is in highlighting key parameters 
that require further empirical investigation. 
Collaborative efforts between sociologists, 
political scientists and other scholars can now 
move forward in an objective way to advance 
our understanding of gun-control policies 
by focusing on the assumed values that are 
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currently less well measured. Armed with 
these, the model might help us to resolve — 
perhaps once and for all — the debate on gun 
ownership. 
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Anembryonic view of 
tumour development 


A genome-wide screen of developing mouse embryos, performed using RNA- 
interference techniques, finds new suspects in skin cancer. But some factors seem to 
have opposing roles in cancer and normal-tissue maintenance. SEE ARTICLE P.185 


PAWEL J. SCHWEIGER & KIM B. JENSEN 


ene function has traditionally been 
studied using organisms in which 
individual genes have been either 
eliminated or artificially expressed in excess. 
The discovery in the 1990s of RNA interference 
(RNAi) — the process by which small RNA 
molecules specifically inhibit gene expres- 
sion by binding to and destroying messenger 
RNAs — established a new framework for 
investigating gene function. Until now, how- 
ever, genome-wide screens using RNAi tech- 
niques have been limited mainly to cultured 
cells, and so cannot take into account the com- 
plex cellular interactions that govern normal 
tissue behaviour or that lead to the develop- 
ment of diseases such as cancer. On page 185 
of this issue, Beronja et al.' describe the first 
in vivo genome-wide RNAi screen, which was 
performed in mouse embryos with the aim of 
identifying genes involved in the development 
of non-melanoma skin cancer’. 
Non-melanoma skin cancer — one of the 
most prevalent tumour types — occurs in the 
epidermal cells that make up the outer layer of 
the skin. In the developing embryo, the epi- 
dermis forms during mid-gestation from the 
surface ectoderm and gradually matures into 
an impermeable barrier with associated struc- 
tures, such as hair follicles. Early studies of non- 
melanoma skin cancer identified members of 
the Ras gene family as potent drivers of the dis- 
ease’, but a lack of techniques for studying gene 
activity at the single-cell level’ has meant that 
little is known about other genes involved in 
the maintenance of these, and other, tumours. 
The team presenting the current paper 


*This article and the paper under discussion! were 
published online on 14 August 2013. 


previously described an approach’ for achiev- 
ing RNAi in the developing epidermis. In 
this technique, cells in the surface ectoderm 
of mouse embryos are transduced by means 
of lentiviral vectors carrying short hairpin 
RNAs (shRNAs), which are cleaved to RNAi 
sequences once the shRNAs are expressed 
in acell. The researchers have now extended 
this approach to screen the entire genome of 
the epidermal cells, using a library of shRNA 
molecules designed to interfere with every 
mRNA in mouse cells. Because cells in the 
developing epidermis divide in an extremely 
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reproducible manner, any RNAi sequence that 
alters the normal rate of proliferation can be 
identified as either enriched or depleted in a 
pool of RNAi sequences. Thus, by comparing 
the relative abundance of shRNA sequences 
in the embryos’ genomic DNA at the time of 
RNAi induction (embryonic day 9.5) and nine 
days later, the authors were able to identify 
genes that, when inhibited, cause proliferative 
advantages or disadvantages. 

Beronja and colleagues conducted two such 
screens: one to assess genes involved in nor- 
mal epidermal development and the other to 
assess genes implicated in the development of 
tumours induced by enhanced expression of 
Hras1. The top candidate gene to emerge in 
both screens, but with an opposite effect on 
proliferation, was that encoding the b-catenin 
protein. This finding is in line with previous 
reports” showing that B-catenin is required 
for the development of non-melanoma skin 
cancer and that loss of this protein specifically 
in the epidermis causes hyperproliferation. 

B-Catenin is an integral component of 
adherens junctions — protein complexes 
that form between cells in epithelial and 
endothelial tissues — and is therefore key to 


Normal 
epidermal cells 


Enhanced proliferation 


Figure 1 | Role of B-catenin in normal and cancer-prone skin cells. Beronja and colleagues’ in vivo 
whole-genome RNA interference (RNAi) screen' of epidermal cells expressing the oncogene Hras 
identified 6-catenin as essential for maintaining cellular hyperproliferation in this model of cancer; 
inhibiting expression of the protein led to decreased proliferation. The authors infer that this effect is 
mediated through B-catenin’s involvement in the Wnt signalling pathway. By contrast, the same screen in 
normal epidermal cells revealed that RNAi-induced inhibition of b-catenin led to enhanced proliferation 
in such cells. Follow-up experiments suggest that this effect results from B-catenin’s role in maintaining 
the adherens-junction protein complexes between epidermal cells; disruption of these junctions impairs 
the contact-inhibition processes that keep cell proliferation in check. 
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intercellular adhesion. It is also a cofactor of the 
Lef1/Tcf family of transcription factors, which 
are effector molecules in signalling pathways 
involving Wnt proteins’. B-Catenin thereby 
regulates several fundamental cellular pro- 
cesses. In the authors’ screen for regulators of 
normal epidermal-cell behaviour, b-catenin 
emerges as a negative regulator of epidermal- 
cell growth (Fig. 1). Follow-up in vitro experi- 
ments suggested that this effect is independent 
of Wnt signalling, and that it stems from a loss 
of contact-mediated inhibition of proliferation. 
By contrast, the authors show that b-catenin 
is required for Hras1-induced hyperprolif- 
eration, and that this correlates with increased 
Wnt signalling in vivo. Intriguingly, however, 
a previous study showed’ that blocking Wnt 
signalling in the epidermis by expression of 
an inhibitory form of Lef1 does not impede 
epidermal tumour formation. Future work is 
therefore needed to evaluate the involvement 
of B-catenin and Wnt signalling in tumour 
development. 

Whereas $-catenin has already been impli- 
cated in numerous malignancies, Beronja and 
colleagues’ screen identified another gene, 
MIlt6, as also being involved in Hras1-induced 
epidermal-cell hyperproliferation. Chromo- 
somal translocations of this gene have previ- 
ously been associated with mixed myeloid 
leukaemia’. Although the exact function of the 
MIlt6 protein is unclear, it is known to influ- 
ence the subcellular localization of the Dotla- 
histone methyltransferase protein complex 
and to regulate specific gene networks”. Like 
B-catenin, Mllt6 is required for tumour growth 
induced by Ras-gene mutations. The exist- 
ence of multiple Lef1/Tcf binding sites in the 
Milt6 promoter indicates that it is a potential 
downstream target of B-catenin-mediated Wnt 
signalling. However, further investigations 
are required to determine whether MIlt6 is 
directly regulated by 6-catenin, whether con- 
certed action of the two proteins drives hyper- 
proliferation, or whether MIlt6 is an essential 
component of another signalling pathway. This 
will be important to resolve, because it influ- 
ences the number of potential downstream 
drug targets. 

The bird’s-eye perspective taken by Beronja 
et al. to identify regulators of growth has pro- 
duced a wealth of information. The extensive 
data sets generated in their study provide 
the basis for further comprehensive com- 
putational analysis. A subset of the genes 
identified is probably involved in epidermal- 
appendage formation, rather than strictly in 
proliferation. Filtering the data on the basis of 
gene-expression patterns during this phase of 
embryo development would provide an excel- 
lent framework for categorizing genes into 
those with functions in homeostasis, develop- 
ment and oncogene-induced growth. More- 
over, it will be exciting to see whether the genes 
implicated in Ras-protein-induced growth 
represent universal mechanisms for abnormal 


proliferation, or if different tissues have 
evolved discrete mechanisms for controlling 
homeostasis. m 
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Boosting X-ray 


emission 


A spectroscopic technique has been demonstrated that uses stimulated 
emission to enhance weak X-ray signals for fundamental studies in materials 


science. SEE LETTER P.191 


ERNST FILL 


uring the past decade or so, a spectro- 
D scopic technique called resonant 

inelastic X-ray scattering (RIXS) has 
evolved as a powerful tool for probing elemen- 
tary excitations in materials'”. Excitations 
accessible to RIXS include charge transfer, 
crystal-field excitations, magnons, molecular 
vibrations and even phonons. Recent progress 
in this field is due to the availability of third- 
generation synchrotrons as X-ray sources 
and to advances in X-ray photon detection’. 
However, the method still suffers from alow 
photon yield because of dominant radiation- 
less processes, requiring high incident photon 
fluxes that cause damage to the sample under 
investigation. On page 191 of this issue, Beye 
et al.’ demonstrate an ingenious way to greatly 
increase the photon yield for RIXS. 

In RIXS, a beam of X-ray photons is directed 
at a sample, and the photon energy is chosen 
such that photon absorption elevates an elec- 
tron from a deep-lying atomic-core energy 
level to a previously unoccupied state in the 
material's valence energy band. This process 
generates a hole — a particle created by the 
absence of an electron — in the atomic core. 
This highly unstable state typically decays 
within a few femtoseconds (1 fs is 10°’ s; in 
the current experiment the core-hole lifetime 
is 19 fs). Usually, the main decay channel is 
the Auger process, in which an electron jumps 
to the core hole and another electron is emit- 
ted. RIXS relies on a different decay process 
in which, again, an electron jumps to the core 
but, instead of an electron, a photon is emitted 
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Figure 1 | RIXS spectroscopy using stimulated 
emission. a, In conventional RIXS, an incident 
photon generates a hole in a deep-lying 
atomic-core energy level and an electron in the 
unoccupied part of the valence energy band 

(left). An electron from the occupied part of the 
valence band fills the core hole, leaving a hole in 
the valence band and emitting a photon (right). 
The emitted photon provides information about 
energy states of the underlying material. b, In 
Beye and colleagues’ experiment’, the downward 
transition is amplified by stimulated emission. 

A brilliant soft-X-ray free-electron laser (FEL) 
beam is focused ona crystalline silicon sample 
and penetrates only about 1 micrometre into 

the sample (yellow). A high percentage of atoms 

is excited, allowing stimulated emission to 
predominate over spontaneous emission and other 
decay channels. This results in a highly directional 
emission close to the surface. 


(Fig. 1a). The fluorescence yield (the ratio of 
the rate of radiative decay to total decay rate) 
is very low, and thus the RIXS signal is weak. 
Nevertheless, RIXS provides a large amount 
of information on the chemical and physical 
features of materials. 

In their study, Beye et al. irradiated a solid- 
state sample with a soft-X-ray free-electron 
laser (FEL), which allows a large fraction of 
atoms to be excited. In this way, a mechanism 
known as stimulated emission takes over and 
the radiative channel is significantly enhanced 
at the expense of non-radiative processes; 
in stimulated emission, an incident photon 
de-excites an excited state by generating a 
second photon with exactly the same energy 
and direction. FEL radiation had previously 
been proposed for excitation’. FELs emit bril- 
liant, tunable, monoenergetic photon beams 
that can be focused to spot sizes smaller than 
10 micrometres, inducing a high density of 
photons (intensity) on the sample. But Beye 
and colleagues went one step further and 
applied FEL radiation to generate the sizeable 
atomic population inversion that is required to 
induce X-ray amplification (gain) and gener- 
ate laser light at the energies of the transitions 
investigated. 

The method is related to the concept of an 
inner-shell X-ray laser. This laser scheme had 
already been suggested in the 1960s’, but has 
only quite recently been realized, also using 
FEL radiation’. In the current experiment, the 
gain generated is so high that the laser transi- 
tion becomes saturated — that is, almost all of 
the excited states are converted into photons 
and the non-radiative channel is effectively 
quenched. The emitted radiation is highly 
directional and radiated close to the sample 
surface (Fig. 1b). Directionality and ampli- 
fication may lead to an enhancement of the 
brilliance of the emitted photon beam by 
several orders of magnitude, solving the 
problem of the weak RIXS signal. 

In Beye and colleagues’ pioneering experi- 
ment, the conditions for observing stimulated 
emission were not optimal. Owing to experi- 
mental constraints, the spectroscopic obser- 
vation had to be limited to a detection angle 
to the surface of 15°, whereas the emission 
maximum was found to occur at 9°. Because 
of this problem, an enhancement of only a 
bit more than a factor of two was measured. 
However, the high directionality of the beam 
was demonstrated clearly, showing the effect 
of stimulated emission. 

The authors’ experiment is an important 
first step towards a new way of using RIXS for 
fundamental studies. Possible improvements 
include an elongation of the irradiated spot for 
increasing the directionality; travelling-wave 
excitation (in which the excitation sweeps over 
the sample at the speed of light, with the gener- 
ated radiation exactly following the excitation) 
to avoid decay of population inversion before 
the beam has left the excited material; and a 


better match of the observation direction to 
the emission direction. 

An open question is how the signal in 
various spectral regions might be altered by 
the stimulated-emission process. If all transi- 
tions are ensured to be saturated, this may not 
be a problem. Another challenge pertains to 
the high directionality of the emission: how 
can the information on the excitations, which 
depends on the change in momentum that the 
photons undergo, be retrieved under this con- 
dition? In ordinary RIXS, photons are emitted 
in different directions and such information is 
derived from this feature. 

Another point concerns the incident- 
photon energy at which stimulated-emis- 
sion RIXS can be carried out. With a photon 
energy of around 100 electronvolts, Beye and 
colleagues’ experiment falls into the very- 
low-energy regime of RIXS. Higher photon 
energies are already available with FELs — for 
example, at the Linac Coherent Light Source at 
the SLAC National Accelerator Laboratory in 
Menlo Park, California, which emits photons 
with energies of up to 10 keV (ref. 6). With 
more-energetic photons, stimulated-emission 
RIXS may revolutionize investigations of many 
chemical elements. However, the shorter life- 
time of the excited states that are reached at 
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such photon energies will make inducing stim- 
ulated emission more challenging. Application 
of the technique may be extended to liquids or 
even gases in order to investigate free atoms or 
molecules’. Owing to the method's increased 
spectral resolution, the investigation of 
new low-energy excitations (for example, 
phonons with energies much below 0.1 eV) 
may be possible. 

This work could open up a new chapter in 
RIXS studies and lead to the discovery of novel 
excitations. A further exciting prospect is the 
potential to make time-resolved measure- 
ments using two incident photon beams’. m 
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Lipopolysaccharide 
sensing on the inside 


Host-cell detection of lipopolysaccharide in the outer membrane of Gram- 
negative bacteria was thought to be restricted to the cell-surface receptor TLR4. 
It emerges that lipopolysaccharide can also be sensed in the cytoplasm. 


VIJAY A. K. RATHINAM 
& KATHERINE A. FITZGERALD 


he innate immune system provides an 

exquisite defence system against micro- 

organisms, but its excessive engage- 
ment can be dangerous. Overproduction of 
cell-signalling molecules called cytokines 
and excessive release of other components of 
dying cells can directly or indirectly injure host 
tissues and, in extreme cases, may lead to blood 
poisoning, or sepsis. The primary trigger of 
such reactions is lipopolysaccharide, the main 
component of the outer membrane of Gram- 
negative bacteria. Genetic studies suggested 
that this component is sensed exclusively 
through Toll-like receptor 4 (TLR4), amember 
of an ancient receptor family dedicated to the 
detection of infectious microorganisms”, but 
there were some hints of TLR4-independent 
sensing of lipopolysaccharide’. 


In an exciting paper published in Science, 
Kayagaki et al.* describe a TLR4-independent 
sensing mechanism for lipopolysaccharide 
(LPS) that occurs in the cytoplasm of macro- 
phages — a type of innate immune cell*. 
Although the identity of the novel LPS recep- 
tor remains unknown, the authors show that 
its engagement results in activation of the 
inflammatory enzyme caspase-11. The find- 
ings greatly enhance our understanding of LPS 
responses and may have implications for the 
treatment of sepsis. 

The discovery emerged from the authors’ 
examination of inflammasome activation 
during Gram-negative bacterial infections. 
Inflammasomes are large, multi-protein com- 
plexes found in the cytoplasm that couple 
pathogen recognition to the maturation of 
IL-16 and other cytokines. Inflammasomes 


*This News & Views article was published online 
on 4 September 2013. 
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also trigger an inflammatory cell-death 
response called pyroptosis. The best-studied 
inflammasome contains the protein NLRP3, a 
sensor of microbial infection that engages the 
adapter molecule ASC, which then recruits 
and activates the effector enzyme caspase-1. 
Although most activators of NLRP3 lead to 
caspase-1 engagement by this mechanism, 
previous work* from the research group pre- 
senting the current paper showed that, dur- 
ing infections with gut pathogens such as 
Escherichia coli and Vibrio cholerae, engage- 
ment of caspase-11 was essential to facilitate 
NLRP3-ASC-dependent caspase-1 activation 
and IL-1$ maturation (Fig. 1). The caspase-11 
pathway was also shown to play a part in 
LPS-induced septic shock. 

Several other studies* * subsequently linked 
caspase-11 to inflammasome responses for a 
broad range of Gram-negative (but not Gram- 
positive) bacterial pathogens, and attempted 
to understand how caspase-11 is activated 
in these infections. Kayagaki and colleagues 
now show that cytoplasmic LPS recognition 
has a role. 

The findings came by a circuitous route. 
The group’s previous work’ had identified 
cholera toxin B (CTB) as a trigger of the cas- 
pase-11 pathway in cells exposed to LPS. The 
latest work describes how one form of LPS, 
O111:B4, facilitates caspase-11 activation by 
CTB, whereas other LPS serotypes or lipid A — 
the portion of LPS that is responsible for TLR4 
activation — do not. It turns out that CTB is 
not itself a trigger of caspase-11 activation but 
instead acts as a vehicle to deliver O111:B4 LPS 
into the cell. This finding led the authors to 
speculate that cytoplasmic LPS is the trigger 
that activates caspase-11, and they validated 
this idea by showing that LPS and lipid A 
led to caspase-11 activation when they were 
delivered directly to the cytoplasm. 

Following experiments showing that a 
modified form of LPS that still activates 
TLR4 failed to trigger the cytoplasmic LPS- 
sensing pathway, Kayagaki et al. hypothesized 
that cytoplasmic LPS is sensed in a TLR4- 
independent manner. In support of this idea, 
they show that TLR4-deficient macrophages 
undergo normal caspase-11-pathway activa- 
tion in response to LPS or lipid A, as long as 
the cells are first primed with ligands of other 
Toll-like receptors, TLR2 or TLR3, to induce 
expression of NLRP3, IL-18 and caspase-11. 
The authors also provide compelling evidence 
linking the cytoplasmic LPS-sensing pathway 
to the recognition of Gram-negative bacteria, 
by monitoring caspase-11 activation in primed 
macrophages infected either with mutant 
E. coli that lacked biologically active LPS or 
with wild-type E. coli. Both normal and TLR4- 
deficient macrophages responded to the wild- 
type bacteria, but the mutant E. coli failed to 
drive responses that depended on caspase-11. 

It has been known for more than a decade 
that TLR4-deficient mice survive doses of LPS 
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Figure 1 | Inflammasome activation by 
caspase-11. Infections with Gram-negative 
bacteria trigger the formation of an inflammasome 
protein complex comprising NLRP3, ASC and 
procaspase-1 in host cells. Active caspase-1 
subsequently triggers the maturation of cytokines 
such as IL-1. During responses to Gram-negative 
bacteria, including Escherichia coli, the progression 
of procaspase-1 to caspase-1 requires caspase-11. 
Caspase-11 expression is regulated by a signalling 
pathway involving the adapter protein TRIF and 
type-1 interferon proteins. This is initiated following 
the recognition of lipopolysaccharide (LPS), a 
major component of the outer membrane of Gram- 
negative bacteria, by the cell-surface receptor TLR4 
in complex with its co-receptor MD2. Kayagaki 

et al.* show that LPS can also be sensed in a TLR4- 
independent manner in the cytoplasm of host cells 
through an as-yet-unidentified receptor, and that 
this leads to the activation of caspase-11. In addition 
to its role in triggering caspase-1 activation, 
caspase-11 can cause host-cell death in a caspase-1- 
independent way. 


that are lethal to normal animals, and Kayagaki 
and colleagues’ earlier work found similar 
responses in mice that are missing caspase-11. 
The authors assessed the relevance of the cyto- 
solic LPS-sensing pathway to these responses 
in TLR4-deficient or wild-type mice by first 
priming the mice with a non-lethal dose of a 
TLR3 ligand, to induce sufficient caspase-11 
expression. Under these conditions, the 
TLR4-deficient mice were as susceptible as 
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wild-type mice to LPS-induced septic shock. 

Collectively, these findings provide com- 
pelling evidence for an LPS sensor in the 
cytoplasm that is involved in detecting Gram- 
negative bacterial infections, and highlight the 
importance of both TLR4 and this sensor in 
eliciting the severe inflammatory responses 
that result in LPS-induced mortality (Fig. 1). 
The study also reinforces an emerging theme 
in our understanding of antimicrobial defence: 
that multiple sensors recognize the same 
microbial product in a way that is specific to 
different cellular compartments. This strat- 
egy might help the host to gauge the severity 
of microbial invasion and tailor the response 
so that it is commensurate with the threat. For 
example, a tiny quantity of LPS will drive a 
TLR4-dependent proinflammatory response 
that alerts the host to the presence of infec- 
tion. Higher quantities of LPS, which reach 
the cytoplasm, will trigger inflammasome acti- 
vation, IL-1 production and, ultimately, cell 
death. This strategy might also be deliberate, 
because the cytoplasmic LPS-sensing pathway 
seems to be much more detrimental to the host 
than TLR4-based recognition. As a result, the 
cytoplasmic pathway would only be engaged 
following an overwhelming infection. 

Several pressing questions arise from this 
study. The most crucial is: what is the identity 
of the sensor? We lack a detailed understand- 
ing of its ability to coordinate caspase-11 acti- 
vation. Another key issue not yet addressed 
is whether cytoplasmic LPS triggers similar 
events in human cells. If it does, this under- 
standing may be useful in the development 
of drugs for treating sepsis. For example, 
eritoran, a highly specific TLR4-inhibiting 
drug, recently failed in human clinical trials 
that aimed to reduce deaths from sepsis"; 
could this be because eritoran fails to block 
the cytoplasmic pathway of LPS recognition? 
About one-third of cases of septic shock result 
from Gram-negative bacterial infections, and 
these cases have a high mortality rate. A better 
understanding of the clinical relevance of 
the caspase-11 pathway in Gram-negative 
bacterial recognition in humans may improve 
the outcome of these infections. = 
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Sulphur from 
heaven and hell 


Fingerprints of sulphur isotopes in rocks from the ridge beneath the Atlantic 
Ocean suggest that a substantial fraction of sulphur at Earth’s surface is left 
over from the formation of the planet’s core. SEE LETTER P.208 


NICOLAS DAUPHAS 


he relative abundances and isotopic 

signatures of sulphur found in rocks 

exhumed from Earth’s mantle act as 
fingerprints of how and where our planet 
acquired its complement of this biochemically 
important element. Textbooks say that Earth’s 
sulphur has the same isotopic composition as 
chondrites, a type of meteorite that is thought 
to best represent the building blocks of ter- 
restrial planets. With dashes of audacity and 
insight, Labidi and co-workers! question this 
assumption in a paper on page 208 of this issue. 
They report measurements of rocks derived 
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from Earth’s mantle that reveal isotopic pat- 
terns unlike any others previously reported, 
with implications for how the planet formed*. 

Most of us take Earth’s chemical composi- 
tion for granted, but unique circumstances 
may have been at play in delivering the right 
mix of elements for life to develop and flourish. 
For example, this is true of the noble metals 
needed in the electronic devices used to print 
or display this page. It is also true of sulphur, 
which is found in the amino acids cysteine and 
methionine and represents around 0.2% of our 
body weight. These elements and a few others 


*This article and the paper under discussion’ were 
published online on 4 September 2013. 
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Figure 1 | Abundances of elements in Earth’s mantle and in chondritic meteorites. The elements 


shown have strong affinities with metallic iron®'””” 


and are in order of increasing volatility. Shaded region 


indicates elements that are not noble metals. Data are normalized to the abundance of aluminium (Al), 
because this element was neither lost to space during Earth’s accretion nor scavenged by metallic iron 
during core formation. The data are also normalized to solar abundances to prevent them from spanning 
many orders of magnitude. The ‘silicate Earth’ pattern shows abundances in the mantle, and has been 
multiplied by a factor of 300 to allow easy comparison with the chondrite data. The depicted elements are 
thought to have segregated into Earth's core as the planet formed, but were subsequently replenished in the 
mantle by the accretion of chondritic material — which is why these elements are found in proportions 
similar to those of chondrites. However, Labidi et al.' suggest that about half or more of the sulphur in 
Earth’s mantle may be left over from core formation (black arrow), with the remainder coming from 
chondrites; the upper part (light red) for sulphur represents the total abundance of the element in the 
mantle, and the lower part indicates the abundance left over from core formation. 
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50 Years Ago 


Prof. L. Egyed has recently 
summarized a number of 
hypotheses concerning the 
expansion of the Earth, and has 
suggested that the Earth's radius is 
expanding at a rate of 0.5-1.0mm 
per year. There appears to be 

a remarkably close agreement 
between the rate of increase of 

the Earth's radius and that of the 
universe according to Hubble’s 

law. Using the at present accepted 
value for Hubble's constant, 
H=100km/s/megaparsec, which 

is 1.65 x 10 * mm per year per 

mile, and substituting the value of 
the Earth’s radius in the Hubble 
equation, v= RH, we obtain a radial 
expansion for the Earth of 0.66 mm 
per year. While this agreement 
may be fortuitous it may suggest a 
fundamental concordance between 
expansion processes in the Earth's 
core and those responsible for the 
expansion of the universe. 

From Nature 14 September 1963 


100 Years Ago 


Ina paper on the psychology of 
insects, read before the General 
Malarial Committee at Madras in 
November, 1912, Prof. Howlett, 
after giving an account of 
experiments carried out by him 
on the response of insects to 
stimuli, comes to the conclusion 
that insects are to be regarded “not 
as intelligent beings consciously 
shaping a path through life, but as 
being in a sort of active hypnotic 
trance.’ It is claimed that this view 
of insect-psychology opens up 
great possibilities in the study of 
insect carriers of disease, since 
“itis no intelligent foe we have 

to fight, but a mere battalion of 
somnambulists.” If we discover the 
stimuli or particular conditions 
which determine the actions of an 
insect, we can apply them to its 
undoing. 

From Nature 11 September 1913 


have strong affinities for metallic iron — so 
much so that when Earth’s metallic, iron-rich 
core formed, it scavenged all these elements, 
leaving behind a rocky mantle barren of sul- 
phur, selenium, tellurium and noble metals. 
The prevailing view regarding the origin of 
these elements is that they were replenished 
in the mantle by a rain of asteroids from the 
heavens, known as the late veneer °. This late 
accretion of meteoritic material may also have 
delivered a fraction of the elements needed for 
life (hydrogen, carbon and nitrogen), as well 
as prebiotic organic molecules that could have 
served as the seeds for life. 

The late-veneer hypothesis for the origin of 
Earth’s sulphur is supported by strong observa- 
tional evidence. First, laboratory experiments 
to reproduce the conditions of core-mantle 
segregation indicate that sulphur, selenium, 
tellurium and the noble metals are efficiently 
scavenged into metallic iron. Second, these 
elements are present in the mantle in propor- 
tions similar to those found in chondrites 
(Fig. 1). And third, the isotopic composition 
of sulphur in mantle rocks is identical to that 
of chondrites®”. It is this third point that Labidi 
et al. call into question. They find that the ratio 
of sulphur-34 to sulphur-32 in Earth’s mantle is 
0.13% lower than that of chondrites. 

To arrive at this conclusion, the authors 
used an analytical technique that provides a 
more complete recovery of sulphur from rock 
samples than has previously been possible. The 
difference in sulphur isotopic ratios that they 
infer between Earth’s mantle and meteorites 
corresponds approximately to the difference 
measured in laboratory experiments when 
sulphur is partitioned between (core-like) 
metal and (mantle-like) silicate. An appeal- 
ing possibility is therefore that a large fraction 
(maybe half or more) of the sulphur in Earth’s 
mantle originated in the bowels of the Earth 
— that is, itis left over from core formation. If 
correct, there is no need to invoke unique cir- 
cumstances to explain the presence of sulphur 
at Earth’s surface. Furthermore, this element 
should be ubiquitous in Earth-like planets, 
raising the possibility of one day detecting 
sulphur-bearing molecules in the atmospheres 
of such extrasolar planets. 

A difficulty with a dynamically active planet 
such as Earth is that geological processes, for 
example partial melting of the interior to form 
magmas and recycling of surface rocks into the 
interior at subduction zones, can blur isotopic 
signals and complicate interpretations. For 
instance, Labidi and co-workers identify the 
isotopic signatures of two distinct sulphur- 
containing components in rocks formed by 
melting of the mantle. One has a sulphur- 
isotope composition distinct from that of 
chondrites, which they interpret to be rep- 
resentative of the mantle. The other has a 
sulphur-isotope composition similar to that 
of chondrites, but the authors ascribe it to 
recycling of sulphur from sediments. This 
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interpretation is reasonable, but a question 
lingers as to whether or not these components 
are representative of their mantle sources. Dur- 
ing magma formation and extraction from 
the mantle, considerable amounts of sulphide 
minerals can remain behind at the magma’s 
source, potentially affecting the sulphur iso- 
topic ratios of magma-derived rocks*. Labidi 
et al. argue against this interpretation of their 
results, but insufficient experimental data 
are available on sulphur-isotope partition- 
ing between sulphide and silicate melts to 
definitely rule out this possibility. 

The relative abundances of selenium, sulphur 
and tellurium in the mantle have been used to 
gain insight into the nature of the late veneer”. 
These relative abundances best match the com- 
position of carbonaceous chondrites (Fig. 1), 
suggesting that Earth received a late veneer 
of material rich in volatile compounds and 
organic molecules. Other isotopic evidence” 
suggests that the nature of the meteoritic mate- 
rial accreted by Earth was not very different 
before and after the completion of core for- 
mation. If Labidi and colleagues are correct 
and a substantial fraction of sulphur in the 
mantle is left over from core formation, this 
undermines the argument for a volatile-rich 
later veneer. 

However, one is left wondering whether the 
good match between the selenium/sulphur 
and tellurium/sulphur ratios of chondrites and 
those of Earth can be coincidental. Labidi and 
co-workers’ measurements are of the highest 
quality and will endure, but the same can be 
said of another study’, published earlier this 
year, the conclusions of which contradict the 
present findings. The questions raised by these 
two conflicting studies will undoubtedly stim- 
ulate further discussion and experiments. = 
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Extracts of 


meteorite 


The Sutter’s Mill meteorite exploded ina 
dazzling fireball over California last year 
(pictured). Writing in Proceedings of the 
National Academy of Sciences, Pizzarello 
etal. report that it contained organic 
molecules not found in any previously 
analysed meteorites (S. Pizzarello et al. 
Proc. Natl Acad. Sci. USA http://dx.doi. 
org/10.1073/pnas.1309113110; 2013). 
The authors’ analysis of extracts of organic 
matter from the meteorite revealed a complex 
mixture of oxygen-rich compounds, which 
probably resulted from oxidative processes 
that occurred in the parent body. Intriguingly, 
the authors suggest that such compounds 
could have been released to prebiotic Earth 
during or after meteorite bombardments, 
adding to the roster of organic molecules 
that might have contributed to the evolution 
of life. Andrew Mitchinson 
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A recipe for ligand- 
binding proteins 


Cellular cross-talk, enzymatic catalysis and regulation of gene expression all 
depend on molecular recognition. A method that allows the design of proteins 
with desired recognition sites could thus be revolutionary. SEE LETTER P.212 


GIOVANNA GHIRLANDA 


o support cellular processes, natural 
proteins have evolved to recognize a 
relatively small set of ligand molecules 

with high affinity and specificity. Broaden- 
ing this set of protein-ligand pairs with syn- 
thetic proteins that are specific for ligands 
of choice could transform the development 
of biosensors, protein-based drugs, artifi- 
cial enzymes and tools for chemical biology. 
Current in vitro approaches to the design of 
protein-ligand pairs rely on laborious directed- 
evolution techniques to engineer binding sites 
in existing protein scaffolds, and cannot be gen- 
eralized. In this issue, Tinberg etal.' (page 212) 
describe a computational method that prom- 
ises to streamline and greatly facilitate the 


*This article and the paper under discussion! were 
published online on 4 September 2013. 


development of tailored ligand-binding 
proteins*. 

One way to engineer synthetic ligand- 
binding proteins is repurposing existing 
proteins through directed evolution. This 
approach involves selecting a few amino acids 
within an existing protein, typically located 
ina cavity, and randomly mutating them to 
create libraries of millions of protein vari- 
ants, which are then selected for their ability 
to bind the target. Depending on the ligand 
and the scaffold protein, successive rounds of 
directed evolution yield binding proteins with 
dissociation constants in the physiologically 
relevant micromolar to nanomolar ranges”. 
The process is arduous, and success is not guar- 
anteed: current methods limit the size of com- 
binatorial libraries to about 10”, which is often 
insufficient to explore a large number of posi- 
tions and so limits the fitness of the resulting 


binders. Furthermore, the process works best 
when the starting protein already binds weakly 
to the desired ligand, rather than when starting 
from scratch. 

Computational methods hold great prom- 
ise to bypass this lengthy process by using vir- 
tual selection on a large number of scaffolds. 
Such approaches have enjoyed much success 
lately, with the design of artificial enzymes for 
a variety of reactions* *. Despite these feats, 
the design of high-affinity ligand-binding 
proteins has remained elusive’. 

Tinberg et al. now present a generalizable 
strategy for designing artificial recognition 
modules for any desired target molecule. 
As their target they chose digoxigenin — a 
natural steroid that forms part of the cardiac 
drug digoxin, and that has found a second life 
as a recognition tag in biotechnology appli- 
cations. Consequently, a few digoxigenin- 
binding agents, including both antibodies 
and non-antibody proteins, have been devel- 
oped through traditional protein-engineering 
methods”. 

Taking some clues from the previously 
developed binding agents, Tinberg and col- 
leagues sketched out a couple of possible 
ways to bind ligands to digoxigenin using 
amino acids that could provide hydrogen 
bonds and hydrophobic interactions (Fig. 1a). 
They then transposed these minimalistic sets 
into three-dimensional models, which were 
used as probes to interrogate a selected set of 
401 proteins, encompassing several classes of 
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Figure 1 | Steps to making a ligand-binding protein. a, To design a protein 
with high binding affinity for the steroid digoxigenin, Tinberg et al.’ created 
a minimal digoxigenin-binding site made with amino-acid side chains 
positioned to maximize hydrogen bonding and hydrophobic interactions 

to the target. They then sifted through a selected set of 401 scaffold-protein 
structures to find all (17) possible matches for the digoxigenin-binding site 


protein fold, for geometrical compatibility. 
Powerful computational methods available in 
the authors’ lab enabled them to rapidly sift 
through all possible matches, rank the can- 
didates and perform optimization through 
rounds of redesign and scoring. 

Seventeen proteins emerged as candi- 
dates for experimental characterization, on 
the basis of optimal shape complementarity 
to the ligand, pre-organization of the site in 
the absence of ligand and predicted protein 
stability. Of these, only two bound the ligand 
with micromolar affinity (Fig. 1b). Although 
an impressive result for a computational 
approach, these binding affinities are far from 
the nanomolar ranges needed for practical 
applications. The authors, therefore, turned 
to directed-evolution methods to improve 
binding affinity. 

To explore the effect of all 20 possible amino 
acids at each of 34 residues located within 
roughly 10 angstroms of the binding site would 
have required an astronomical number (more 
than 10“) of possible protein sequences. For- 
tunately, a new method” allowed Tinberg et al. 
to circumvent this limitation — by randomiz- 
ing each of the 34 positions independently, 
selecting for binding to labelled digoxigenin, 
and using deep sequencing to build a fitness 
map that correlates amino-acid identity at 
each position to function. On the basis of this 
information, the authors combined favour- 
able substitutions in successive generations 
of libraries to obtain mutants with affinity 
in the low-nanomolar to high-picomolar 
ranges (Fig. lc). 

The litmus test for measuring successful 
protein design is obtaining high-resolution 
structures of the final product and comparing 
it with the designed features in the models. 
Tinberg and colleagues’ proteins pass this test 
with flying colours: structural characteriza- 
tion of the two best mutants, DIG10.2 and 


Scaffold matching 
x) 


and ranked them by best fit. b, After validating the predictions through 
protein expression, they subjected the two best performers, which had 
micromolar binding affinities (K,), to directed-evolution methods, 
randomizing amino acids farther away from the binding site. c, Finally, a 
protein with picomolar binding affinity emerged as winner, and the authors 
obtained its crystal structure to confirm the design features. 


b Two viable candidates 
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its improved version, DIG10.3, showed the 
presence of hydrogen bonding between three 
designed tyrosine residues and the ligand. 
Tyrosine is frequently found in the binding 
pockets of antibodies and of synthetic ligand- 
binding proteins: its ability to form hydro- 
phobic and hydrogen-bonding interactions, 
as well as its size and conformational rigidity, 
fulfil many roles in molecular recognition”. In 
addition to the designed features, subjecting 
the proteins to directed evolution allowed the 
exchange of small hydrophobic amino acids 
with larger ones, and the introduction of two 
additional tyrosines within the binding site. 
These mutations reinforce design principles 
incorporated in the computational selection, 
such as shape complementarity and binding- 
site pre-organization, which maximize the 
entropic contribution to binding as well as 
maximizing specificity. 

Tinberg et al. also exploited hydrogen- 
bonding interactions, which impart strong 
geometric constraints and thus contribute 
to specificity for binding to digoxigenin 
as opposed to other steroids. This point is 
elegantly illustrated by a series of mutants in 
which the authors correlate the number of 
hydrogen-bonding groups on the target ster- 
oids with the effect of swapping each tyrosine 
for phenylalanine, which cannot hydrogen 
bond, and demonstrate the large energetic 
penalty paid for each mismatch. 

The success of this protocol overcomes a 
long-standing issue in protein design, and 
promises to have an impact on the effective- 
ness of designed enzymes, which has often 
been limited by low affinity for the substrate. 
This work also opens up many exciting chal- 
lenges. Digoxigenin proved a wise choice of 
target ligand, and contributed to the authors’ 
success: the molecule offers several hydrogen- 
bonding ‘hooks, which helped to achieve high 
binding specificity in the protein, and the 
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steroid’s rigid scaffold helped to maximize 
entropic contributions to binding. The authors 
were also able to gather information on what 
the binding site should look like from exist- 
ing structures. Future work will, no doubt, 
implement ways to deal with flexible ligands 
and to design minimalistic binding sites in 
the absence of structural information. More 
powerful computational methods would 
enable affinity maturation in silico rather 
than in vitro, and would overcome the need 
for an existing scaffold as a starting point, by 
designing novel proteins around each target 
ligand’. Finally, molecular recognition could 
be coupled to functional activity, for example 
by using small-molecule binding to modulate 
protein activity. = 
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Social reward requires coordinated 
activity of nucleus accumbens oxytocin 


and serotonin 


Giil Délen'+, Ayeh Darvishzadeh', Kee Wui Huang! & Robert C. Malenka' 


Social behaviours in species as diverse as honey bees and humans promote group survival but often come at some cost to 
the individual. Although reinforcement of adaptive social interactions is ostensibly required for the evolutionary persis- 
tence of these behaviours, the neural mechanisms by which social reward is encoded by the brain are largely unknown. 
Here we demonstrate that in mice oxytocin acts as a social reinforcement signal within the nucleus accumbens core, 
where it elicits a presynaptically expressed long-term depression of excitatory synaptic transmission in medium spiny 
neurons. Although the nucleus accumbens receives oxytocin-receptor-containing inputs from several brain regions, 
genetic deletion of these receptors specifically from dorsal raphe nucleus, which provides serotonergic (5-hydroxytryptamine; 
5-HT) innervation to the nucleus accumbens, abolishes the reinforcing properties of social interaction. Furthermore, 
oxytocin-induced synaptic plasticity requires activation of nucleus accumbens 5-HTIB receptors, the blockade of which 
prevents social reward. These results demonstrate that the rewarding properties of social interaction in mice require the 
coordinated activity of oxytocin and 5-HT in the nucleus accumbens, a mechanistic insight with implications for 


understanding the pathogenesis of social dysfunction in neuropsychiatric disorders such as autism. 


The mesocorticolimbic (MCL) circuit, implicated in encoding the rewar- 
ding properties of addictive drugs, is likely to have evolved to motivate 
behaviours that were important for survival and reproduction. Such 
incentive behaviours include eating, drinking and copulation, and are 
reinforced by so-called ‘natural rewards’ (for example, food, water, 
pheromones)'. Growing evidence suggests that social interaction itself 
can act asa natural reward’. However, given the diversity of social beha- 
viours (for example, parental investment, mating, cooperation) and 
the selection pressures that shaped their emergence (reproductive, 
predation, limited resources)’, it remains unclear whether evolutiona- 
rily conserved neural mechanisms exist to encode social reward. 

An important clue comes from studies that have related pair-bonding 
behaviour in prairie voles (Microtus ochrogaster) to elevated expression 
of oxytocin receptors (OTRs) in the nucleus accumbens (NAc), a key 
component of the brain’s MCL reward circuit*. However, the species- 
specific nature of this mating behaviour and the reported paucity of 
OTR expression in the NAc of mice**® questions the relevance of NAc 
OTRs to consociate social behaviours. This topic is of particular interest 
given that polymorphisms in the OTR gene have been associated with 
autism spectrum disorders, which are characterized by profound social 
deficits, and may be amenable to treatment with oxytocin (OT)’. 

Mice are social animals: they live in consociate “demes’ consisting of 
five to ten adult members that share territorial defence® and allopa- 
rental responsibilities’, and exhibit several behaviours (for example, 
vocal communication, imitation, and empathy)’*” that are the hall- 
marks of sociality. As in several other species including humans, OT 
has been linked to social behaviours in mice’. However, OT and OTR 
knockout mice show a number of related behavioural deficits (such as 
memory impairment, anxiety, stress, aggressivity)° that make it diffi- 
cult to parse the function of OT as a social reward signal in the central 
nervous system. To examine the hypothesis that OT signalling in mice 
is required for the rewarding properties of social interactions, we used 


a conditioned place preference (CPP) assay that has traditionally been 
used to study the rewarding properties of drugs of abuse’? and recently 
has been expanded to include social reward". 


Social reward requires oxytocin 

Male wild-type mice were conditioned for social CPP (Fig. 1a, b) while 
receiving intraperitoneal injections of either saline or the OTR anta- 
gonist (OTR-A), L-368,899 hydrochloride (5 mg kg’, twice a day for 
2 days). Saline-treated wild-type mice showed a robust place preference 
for the socially conditioned context, whereas OTR-A treated mice showed 
no preference (Fig. 1c-e). Neither locomotor activity (Supplementary 
Fig. la-i) nor cocaine CPP (Supplementary Fig. 2a—d) was altered by 
OTR-A treatment, demonstrating the specificity of the effects of OTR- 
A for the social domain. Furthermore, OTR-A, but not saline, localized to 
the NAc using Andalman probes (Fig. 1f and Supplementary Fig. 3a, b) 
prevented social CPP (Fig. 1g-i), demonstrating that OT action in the 
NAc is required for consociate social reward. 

Given the known species- and sex-specific variation in OTR 
expression””'>’®, it is notable that no study so far®'”-”” has determined 
whether hypothalamic OTergic inputs to the NAc exist in male mice. 
Here we injected recombinant rabies virus expressing enhanced green 
fluorescent protein (eGFP) (RBV-eGFP) into the NAc, where it is taken 
up by presynaptic terminals and retrogradely transported to cell bodies 
(Supplementary Fig. 4). In a substantial subset of hypothalamic neu- 
rons in the paraventricular nucleus (PVN), but not the supraoptic 
nucleus (SON), robust eGFP expression co-localizes with OT, indi- 
cating a direct axonal OTergic projection to the NAc (Supplementary 
Figs 4 and 5). Furthermore, these results suggest that it is the magno- 
cellular projection from the SON that distinguishes prairie voles® from 
rats’? and mice. Although they do not rule out an additional contri- 
bution of paracrine release, our findings demonstrate a significant synap- 
tic source for OT in the NAc of male mice. 


1Nancy Pritzker Laboratory, Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, 265 Campus Drive, Stanford, California 94305, USA. +Present address: Department 


of Neuroscience, Johns Hopkins University, 855 North Wolfe Street, Baltimore, Maryland 21205, USA. 
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Figure 1 | Oxytocin is required for social CPP. a-i, Protocol for social CPP 
(a). Experimental time course of intraperitoneal (i.p.) injections (b) and NAc 
reverse microdialysis (f) in social CPP. Individual (top) and average (bottom) 
responses in animals receiving intraperitoneal (i.p.) (c) or NAc (g) saline versus 
animals receiving i.p. (d) or NAc (h) OTR-A. For both ip. and NAc delivery 
routes, saline- but not OTR-A-treated animals spend more time in social 
bedding cue following conditioning (n = 18 ip. saline, m = 15ip.OTR-A;n=9 
NAc saline and n = 11 NAc OTR-A animals). Values below 900 indicate that 
subjects preferred isolate bedding; values above 900 indicate that subjects 
preferred social bedding. e, i, Comparisons between treatment (Rx) groups 
reveal significantly decreased normalized and subtracted social preference in 
both ip. and NAc OTR-A-treated animals. Summary data are presented as 
mean + s.e.m. (*P < 0.05, Student’s t-test). Each arrowhead indicates an 
application of saline or OTR-A. 
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Oxytocin induces presynaptic LTD in NAc MSNs 


To interrogate directly the synaptic role of OT within the NAc, we 
recorded excitatory postsynaptic currents (EPSCs) from NAc medium 
spiny neurons (MSNs) in acute slices. Bath application of OT (1 uM, 
10 min) caused a long-term depression (LTD) of EPSCs that was blocked 
(Fig. 2a—c) but not reversed (Fig. d-f) by the OTR-A L-368,899 hydro- 
chloride (1 uM, continuous or 10 min application respectively). The 
magnitude of this oxytocin-induced LTD was significantly decreased 
in slices from socially conditioned versus isolation conditioned animals 
(Fig. 2g-i), consistent with the hypothesis that social experience elicits 
or influences the generation of OT-LTD. 

To determine whether social experience preferentially influenced OT- 
LTD in one of the two major components of the basal ganglia circuit, 
direct (D1-receptor-expressing) versus indirect (D2-receptor-expressing) 
pathway MSNs”, targeted recordings were made from NAc slices pre- 
pared from bacterial artificial chromosome (BAC) transgenic D1- 
TdTomato, and D2-eGFP reporter mice*’”’. Application of OT induced 
robust LTD (Fig. 2j-l) and isolation conditioning resulted in increased 
magnitude of OT-LTD (Supplementary Fig. 6) in both D1- and D2- 
receptor-expressing MSNs, suggesting that these phenomena do not 
display direct and indirect pathway specificity. 

To determine whether the OT-LTD was expressed pre- or postsynap- 
tically, we performed a number of standard electrophysiological synaptic 
assays. The frequency, but not the amplitude, of miniature EPSCs, was 
significantly decreased by OT application (Fig. 2m-q). Furthermore, both 
the paired-pulse ratio (PPR) of EPSCs (50-ms inter-stimulus interval) and 
the coefficient of variation of the EPSCs increased following OT applica- 
tion (Fig. 2r-t). Together, these findings suggest that OT-LTD results 
from a decrease in presynaptic neurotransmitter release probability. 


Social reward requires presynaptic OTRs in NAc 


Anatomical studies have revealed sparse expression of OTRs in mouse 
NAc**. Moreover, immunostaining in OTR-Venus reporter mice” 
indicates that the small subset of cells that do express OTRs in the NAc 
are either inhibitory interneurons or glial cells (Supplementary Fig. 7). 
To test the hypothesis that OTRs in the NAc are preferentially loca- 
lized to presynaptic boutons deriving from afferent inputs, we injected 
TdTomato-expressing RBV (RBV-TdTomato) into the NAc of OTR- 
Venus reporter mice (Supplementary Fig. 8). Cellular co-localization 
of TdTomato and Venus was detected in several, but not all, brain 
regions projecting to the NAc (Supplementary Fig. 8), identifying a 
number of putative sources of presynaptic OTRs in the NAc. 

To extend the anatomical mapping of OTRs to their functional role 
in social reward in vivo, we used conditional OTR knockout mice”* 
combined with Cre recombinase (Cre)-expressing RBV or adeno- 
associated virus (AAV) injected into the NAc, an approach that enabled 
selective ablation of pre- or postsynaptic NAc OTRs, respectively. 
Normal social CPP was observed in both sham-injected wild-type and 
conditional OTR mice (Fig. 3a—d). Injection of the AAV-Cre-eGFP to 
delete OTRs from cells within the NAc did not affect social CPP in either 
wild-type or conditional OTR mice (Fig. 3e-h). Consistent with this lack 
of effect of deleting OTRs from cells within the NAc, OT application did 
not induce long-lasting changes at inhibitory synapses onto MSNs 
(Supplementary Fig. 9). In contrast, injection of RBV-Cre-eGFP to 
delete presynaptic OTRs in the NAc, completely blocked social CPP in 
conditional OTR knockout mice but had no effect in wild-type mice 
(Fig. 3i-l). Injection sites and viral expression were confirmed for all 
animals (Supplementary Figs 10 and 11). Considered together with 
the pharmacological results showing OTRs within the NAc are 
required for social CPP (Fig. 1f-i), these results indicate that OTRs 
on presynaptic boutons within the NAc are required for social reward. 


Social CPP and LTD require dorsal raphe inputs and 
SHTIB receptors 

To determine which of the afferent inputs expressing OTRs identi- 
fied by RBV-mediated molecular ablation are required for social CPP, 
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(a, d, g, j), summary time course (b, e, h, k), and average post-treatment 
magnitude comparisons (¢, f, i, 1) reveal significant EPSC response depression 
in oxytocin-treated but not OTR-A-pre-incubated cells (a-c, n = 6 OT 
(oxytocin), 1 = 6 OT + OTR-A pre-incubation cells). OT-response depression 
is not reversed by post-induction OTR-A chase (d-f, n = 7 cells). The 
magnitude of OT-LTD is significantly increased in cells from isolation versus 
socially reared animals (g-i, isolate, n = 14, social n = 27 cells). The magnitude 
of EPSC OT-LTD is not different in D1 versus D2 MSNs (j-l, 1 = 9 D1, and 


we next injected AAV-eCre-eGFP into selected brain regions of con- 
ditional OTR mice. Deleting OTRs in either the anterior cingulate cortex 
or the ventral subiculum had no effect on social CPP (Supplementary 
Fig. 12), whereas AAV-Cre-eGFP injections into the dorsal raphe nucleus 
of conditional OTR mice, but not wild-type mice, prevented social CPP 
(Fig. 4a-d). This same manipulation also significantly reduced OT- 
LTD recorded ex vivo (Fig. 4e-g). Together these results provide support 
for the hypothesis that presynaptic OTRs on dorsal raphe nucleus axon 
terminals within the NAc are specifically required for social reward. 
Since the dorsal raphe nucleus is one of the major sources of serotonin 
(5-HT) in the brain, we further characterized NAc projection neurons 
in the dorsal raphe nucleus and found substantial overlap between 
OTR- and 5-HT-expressing cells (Supplementary Fig. 13), raising 
the possibility of coordinated activity of these transmitters in the NAc. 
Given that 5HT1B receptors have been implicated in social beha- 
viours~*° and autism’, and their activation elicits a presynaptic LTD 
in the striatum*’, we reasoned that OT may induce LTD in the NAc 
through activation of 5HT1B receptors. Consistent with previous 
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n= 11 D2 cells). m-q Representative miniature EPSC traces (m), cumulative 
probability (n, 0), and average (p, q) comparisons reveal that miniature EPSC 
frequency (n, p), but not amplitude (0, q), is decreased in OT-treated versus 
control cells (control, m = 11, OT, n = 11 cells). r-t, Comparisons of 
representative traces (r) and average (s) paired-pulse ratios PPR (n = 6 cells) as 
well as average (t) coefficient of variance, CV (n = 32 cells) reveal significant 
increases following induction of OT-LTD. Summary data are presented as 
mean = s.e.m. (*P < 0.05, Student’s t-test). Numbered traces (1, 2 and 3) were 
taken at the times indicated by numbers below the graphs. 


results”, application of the 5HT1B selective agonist CP-93129 
induced robust LTD in NAc MSNs (Fig. 5a—c). Subsequent applica- 
tion of OT caused no further depression (Fig. 5a—c), suggesting that 
the 5HT1B receptor-induced LTD had occluded OT-LTD. To test 
whether OT-LTD required release of 5HT within the NAc, we applied 
the 5HT1B receptor antagonist NAS-181 (201M) to NAc slices, a 
manipulation that largely prevented the LTD normally induced by 
OT (Fig. 5d-—f). In contrast, 5HT1B receptor-induced LTD was readily 
induced in slices in which OTRs had been pharmacologically blocked 
(Fig. 5g-i) or molecularly ablated from dorsal raphe nucleus projec- 
tions (Supplementary Fig. 14). Application of NAS-181 also prevented 
the decrease in miniature EPSC frequency normally elicited by OT 
(Fig. 5j-n), but had no effect on its own (Supplementary Fig. 15). 
Together these results support the hypothesis that activation of OTRs 
on the terminals of dorsal raphe nucleus axons within the NAc leads to 
a 5HT1B receptor-dependent form of LTD (Supplementary Fig. 16) 
and that this synaptic modulation is necessary for social reward, as 
measured by social CPP. A strong prediction of this hypothesis is that 
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Figure 3 | Presynaptic OTRs are required for social CPP. a-I, Experimental 
time course for sham (a), NAc AAV-Cre-eGFP injection showing AAV particle and 
spread of Cre-eGFP from injection site (e), and NAc RBV-Cre-eGFP injection showing 
RBV particle and spread of Cre-eGFP from injection site (i). Individual (top) and 
average (bottom) responses in wild-type (WT) (b, £,j), versus conditional OTR (cOTR) 
(c g, k) animals receiving sham (b, c), NAc AAV-Cre-eGFP (f, g) or NAc-RbV-Cre- 
eGFP (j,k). WT animals, as well as sham and NAc AAV-Cre-eGFP-injected COTR 
animals, but not COTR animals injected with NAc RBV-Cre-eGEP, spend more time in 
the social bedding cue following conditioning (sham WT, n = 15, COTR, n = 8; NAc 
AAV-Cre-eGFP WT, n = 15,cCOTR,n = 19; NAc RBV-Cre-eGFP WT, n = 14,cOTR, 
n = 22 animals). d, h, l, Comparisons between WT and cOTR animals reveal normal 
social CPP in sham and NAc AAV-Cre-eGFP-injected animals, whereas in NAc RBV- 
Cre-eGFP-injected animals social CPP is significantly decreased in COTR versus WT 
controls. Summary data are presented as mean + s.e.m. (*P < 0.05, Student's t-test). 
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Figure 4 | NAc OTRs in presynaptic terminals originating from the dorsal 
raphe nucleus are required for social CPP and OT-LTD. a, Experimental 
time course of dorsal raphe nucleus (dRph) AAV-Cre-eGFP injections in social 
CPP. b, c Individual (top) and average (bottom) comparisons reveal that dRph 
AAV-Cre-eGFP-injected WT (b), but not COTR (c) animals spend significantly 
more time in the social bedding cue following conditioning (WT, n = 14, 
cOTR, n= 10 animals). d, Comparisons between dRph AAV-Cre-eGFP- 
injected groups reveal significantly decreased social CPP in cCOTR animals 
compared to WT controls. e-g, Representative traces (e), summary time course 
(f) and average post-treatment magnitude comparisons (g) reveal absence of 
OT-LTD in EPSCs recorded from dRph AAV-Cre-eGFP-injected cCOTR 
knockout versus pooled WT control animals (dRph AAV-Cre-eGFP-injected 
cOTR, n = 6 cells; pooled WT control, n = 30 cells). Summary data are 
presented as mean + s.e.m. (*P < 0.05, Student’s f-test). 


blockade of 5SHT1B receptors within the NAc should prevent social 
CPP. Consistent with this prediction, NAS-181, but not saline, infu- 
sions into the NAc during conditioning (Fig. 6a) prevented the occur- 
rence of social CPP (Fig. 6b-d). 


Concluding remarks 


We have demonstrated that the coordinated activity of OT and 5-HT 
is required for the reward associated with social interactions and modi- 
fies MCL circuit properties by generating LTD of excitatory synapses 
onto MSNs in the NAc. Moreover, our findings specifically implicate 
OT-mediated 5-HT release in the NAc in the regulation of social reward. 
Since OT-LTD occurs in both D1- and D2-receptor-expressing MSN 
subtypes, as does 5HT1B-LTD”, these results suggest that social reward 
is not expressly governed by the dichotomies proposed by prevailing 
models of striatal function”. Indeed, the two-pathway framework for 
striatal function is almost certainly oversimplified” and computational 
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Figure 5 | OT-LTD in NAc requires 5HT1B receptors. a-i, Representative 
traces (a, d, g), summary time course (b, e, h) and average post-treatment 
magnitude comparisons (c, f, i) reveal that EPSC depression in cells treated with 
5HTRIB agonist (CP-93129 dihydrochloride) is not augmented by subsequent 
application of OT (a-c, n = 5 cells); OT-LTD is significantly reduced in cells 
pre-treated with the 5HTR1b-antagonist (NAS-181) (d-f, control, n = 7, 
5HTRIB antagonist, m = 7 cells); SHTR1B-mediated LTD induced by 
application of CP-93129 is not affected by pharmacological blockade of OTRs 
(g-i, n = 5 cells). j-n Representative miniature EPSC traces (j), cumulative 
probability (k, 1), and average (m, n) comparisons reveal miniature EPSC 
frequency (k, m), but not amplitude (1, n), is decreased in OT-treated cells 
versus cells treated with OT in the presence of NAS-181 (OT, n = 17, 

OT + 5HTRIb-A, n = 17 cells). Summary data are presented as mean + s.e.m. 
(*P < 0.05, Student’s t-test). 


modelling studies have proposed that reinforcement learning engages 
multiple neuromodulatory reward circuits in parallel*’. Furthermore, 
5-HT and dopamine systems may represent reward in fundamentally 
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Figure 6 | Social CPP requires NAc 5HT1B receptors. a, Experimental time 
course of NAc reverse microdialysis. b, c, Individual (top) and average (bottom) 
responses in animals receiving NAc saline (b) versus SHTR1B antagonist 
(SHTRIB-A) (c). Saline-treated animals, but not 5HTR1B-A-treated animals, 
spend more time in social bedding cue following conditioning (NAc saline, 
n= 20, NAc 5HTRIB-A, n = 26 animals). d, Comparisons between treatment 
groups reveal significantly decreased normalized and subtracted social 
preference in NAc 5HTR1B-A-treated animals compared to saline controls. 
Summary data are presented as mean + s.e.m. (*P < 0.05, Student’s t-test). 


different ways***°. Future studies examining the interplay between 
dopamine and 5-HT in the regulation of social reward will therefore 
be informative. 

In light of estimates that the shift to social living preceded the 
emergence of pair-living by 35 million years’, we suggest that the NAc- 
dependent social reward mechanisms described here are the prede- 
cessors of evolutionary specializations seen in prairie voles***°. These 
mechanisms utilize presynaptically localized OTRs, which couple to 
G-proteins’, and thus may have been overlooked by previous studies 
that relied on receptor autoradiography and transcript tagging to 
conclude that OTRs do not exist in the NAc of consociate species like 
mice”. Moreover, as it is these antecedent social behaviours that are 
disrupted in neuropsychiatric diseases such as autism”’, the elucida- 
tion of the neural mechanisms mediating social reward is a critical step 
towards the development of rational, mechanism-based treatments 
for brain disorders that involve dysfunction in social behaviours. 


METHODS SUMMARY 


All procedures were conducted in accordance with the animal care standards set 
forth by the National Institutes of Health and were approved by Stanford University’s 
Administrative Panel on Laboratory Animal Care. Male young adult mice (4 to 6 
weeks of age) on a C57BL/6 background were used for all studies. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Animals. Male young adult (4 to 6 weeks of age) C57BL/6 (Charles River), DRD1A- 
TdTomato BAC transgenic”’ (D1-TdTomato, gift of N. Calakos), DRD2-eGFP 
BAC transgenic” (D2-eGFP), Oxtrtm1.1Wsy homozygous” (conditional OTR 
knockout, Jackson Laboratory), or OTR Venus Neo/+” (heterozygous OTR- 
Venus reporter, gift of L. J. Young) mice backcrossed to C57BL/6 were used for 
all experiments. All procedures complied with the animal care standards set forth 
by the National Institutes of Health and were approved by Stanford University’s 
Administrative Panel on Laboratory Animal Care. All animals were maintained 
ona 12h-12h light-dark cycle. Experimenters were blind to the treatment con- 
dition when subjective criteria were used as a component of data analysis, and 
control and test conditions were interleaved for all experiments. 

Behavioural assays. The protocol for social conditioned place preference (social CPP) 
was shortened to 2 days of conditioning (Fig. 1a) from 10 days of conditioning’*”*. 
Animals were weaned (or delivered from Charles River) at 3 weeks of age into 
‘home’ cages containing 3 to 5 cage-mates, and housed on corncob bedding (Bed- 
O’Cobs, 0.125 inches, PharmaServ). One to two weeks later, animals were subjected 
to experimental manipulations and returned to their home cage (all cage-mates were 
of the same genotype and received the same experimental manipulation). Animals 
were then placed in open field activity chamber (ENV-510, Med Associates) 
equipped with infrared beams and a software interface (Activity Monitor, Med 
Associates) that monitors the position of the mouse. The apparatus was divided 
into two equally sized zones using a clear plastic wall, with a 5-cm diameter circular 
opening at the base; each zone contained one type of novel bedding (Alpha-Dri, 
PharmaServ, Alpha Chip, PharmaServ; Bed-O’Cobs, 0.25 inches, PharmaServ; or 
Kaytee Soft Granule, Petco). The amount of time spent freely exploring each zone 
was recorded during 30-min test sessions. After an initial test (pre-conditioning 
trial) to establish baseline preference for the two sets of bedding cues, mice were 
assigned to receive social conditioning (with cage-mates) for 24h on one type of 
bedding, followed by 24h on the isolate bedding cue (without cage-mates) on the 
other type of bedding. Bedding assignments (social versus isolate) were counter- 
balanced for an unbiased design. Twenty-four hours later, animals received a 30- 
min post-conditioning trial to establish preference for the two conditioned cues. 
Animals were excluded (pre-established criteria) if they exhibited a pre-conditioning 
preference score of >1.5 or <0.5 (for an unbiased procedure); pre-conditioning 
versus post-conditioning social preference scores were considered significant if 
paired student’s t-test P values were <0.05. Comparisons between experimental 
conditions were made using both normalized social preference scores (time spent 
in social zone; post-trial divided by pre-trial), and subtracted social preference 
scores (time spent in social zone; post-trial minus pre-trial); these were considered 
significant if unpaired student’s t-test (two conditions), or analysis of variance 
(ANOVA) (three conditions, Supplementary Fig. 12) P values were <0.05. 

For cocaine-conditioned place preference (cocaine CPP), the apparatus was 
divided into two equally sized zones using plastic floor tiles with distinct visual 
and tactile cues (grey and smooth, or white and rough). After 5 days of saline 
injections twice a day for habituation in the home cage, the amount of time spent 
freely exploring each zone was recorded during 30-min test sessions. After an 
initial test to establish baseline preference for the two sets of cues, mice in each of 
the two treatment groups (intraperitoneal saline or intraperitoneal OTR-A) were 
randomly assigned in a counterbalanced fashion to receive cocaine (20 mg kg”) 
or saline in the presence of one set of cues (that is, an unbiased design). The second 
conditioning session was conducted 24h later in the presence of the other set of 
cues. The post-conditioning test session was conducted 24h after the second 
conditioning session to determine time spent in the presence of the cocaine versus 
saline associated cue. Isolation and socially housed animals were not different in 
terms of cocaine CPP so they were pooled for further analysis. Pre-conditioning, 
post-conditioning, subtracted, and normalized cocaine preference scores were 
calculated as for social CPP. 

Andalman probes. Modified Andalman probes were constructed as described 
previously (Supplementary Fig. 3). In brief, probes consisted of a reservoir 
(Polypropylene Luer Hub) attached to a double cannula guide (C235gs, 26GA, 
C/C distance 2 mm, 5 mm pedestal, cut 4 mm below pedestal, custom specified for 
mouse bilateral NAc coordinates, Plastics One). Polyimide tubing (40 American 
Wire Gauge, 0.0031 inches internal diameter, 0.0046 inches outside diameter, 
0.00075 inches Wall, Small Parts) was threaded through the stainless steel tubing 
of the cannula guide on one end, and out of a hole drilled into the luer hub to act as 
a flush outlet (outflow tube) on the other end. The dialysis membrane (Spectra/ 
Por, 13-kD molecular weight cut-off, Spectrum Laboratories) was then threaded 
over the outflow tube and through the cannula guide; ends were cut such that 
~500 jum of dialysis membrane was exposed below the cannula guide and above 
the sealed end. Junctions were sealed with bio-compatible epoxies (Epo-Tek 730, 
Epo-Tek 301, Epoxy Technologies). In this design, a pharmacological agent could 
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be intracranially delivered rapidly, continually and concurrently to all members 
of the social group, without anaesthesia. 

Once male mice reached age postnatal day 35 to 40, probes were implanted into 
the NAc of male mice following bilateral craniotomy (bregma 1.54 mm; lateral 
1.0mm) and attached to the skull using dental acrylic. Previous reports indicate 
that for complete pharmacological effect, drug concentration in the reservoir must 
be ~500 times the dose used for direct injections’, thus OTR and 5HTRIB 
antagonists were applied at 10 mM (L-368,899) and 85 mM (NAS-181) concentra- 
tions in a volume of 25 ul saline. Probe placement and competency was verified by 
post-hoc application of concentrated Fluorescein sodium salt (Sigma-Aldrich) to 
reservoir before intracardial PFA perfusion and histology (Supplementary Fig. 3). 
Virus generation. Rabies virus (RBV) was generated from a full-length comple- 
mentary DNA plasmid containing all components of RBV (SAD L16; gift from 
K.-K. Conzelmann)*". We replaced the rabies virus glycoprotein with eGFP (RBV- 
eGFP), TdTomato (RBV-TdTomato) or Cre-eGFP to generate RBV-expressing Cre- 
eGFP (RBV-Cre-eGFP), eGFP (RBV-eGFP) or TdTomato (RBV-TdTomato). To 
rescue RBV from this cDNA we used a modified version of a published protocol**”’. 
In brief, HEK293T cells were transfected with a total of 6 plasmids; 4 plasmids 
expressing the RBV components pTIT-N, pTIT-P, pTIT-G and pTIT-L; one 
plasmid expressing T7 RNA polymerase (pCAGGS-T7), and the aforementioned 
glycoprotein-deleted RBV cDNA plasmid expressing Cre-eGFP, eGFP or TdTomato. 
For the amplification of RBV, the media bathing these HEK293T (ATCC) cells was 
collected 3 to 4 days post transfection and moved to baby hamster kidney (BHK) 
cells stably expressing RBV glycoprotein (BHK-B19G)”. After 3 days, the media 
from BHK-B19G cells were collected, centrifuged for 5 min at 3,000g to remove cell 
debris, and concentrated by ultracentrifugation (55,000g for 2h). Pellets were 
suspended in Dulbecco’s PBS, aliquoted and stored at — 80 °C. The titre of concen- 
trated RBV was measured by infecting HEK293 cell and monitoring fluorescence. 
Plasmids expressing the RBV components were gifts from K.-K. Conzelmann and 
I. Wickersham. BHK cells stably expressing B19G were a gift from E. Callaway. 

The adeno-associated viruses (AAVs) used in this study were produced by the 
Stanford Neuroscience Gene Vector and Virus Core. In brief, AAV-DJ™“ was 
produced by transfection of AAV 293 cells (Agilent) with three plasmids: an AAV 
vector expressing Cre-eGFP, AAV helper plasmid (pHELPER, Agilent), and AAV 
rep-cap helper plasmid (pRC-DJ, gift from M. Kay). At 72 h after transfection, the 
cells were collected and lysed by a freeze-and-thaw procedure. Viral particles were 
then purified by an iodixanol step gradient ultracentrifugation method. The iodix- 
anol was diluted and the AAV was concentrated using a 100-kDa molecular weight 
cut-off ultrafiltration device. The genomic titre was determined by quantitative PCR. 
Stereotaxic injections. Stereotaxic injection of viruses into NAc was performed 
under general ketamine-medetomidine anaesthesia using a stereotaxic instrument 
(David Kopf). A small volume (~1 ul) of concentrated virus solution was injected 
bilaterally into NAc core (bregma 1.54 mm; lateral 1.0 mm; ventral 4.0 mm), uni- 
laterally into the dorsal raphe nucleus (bregma —3.3 mm; lateral 0.0 mm; ventral 
3.35 mm), bilaterally into the ventral subiculum (bregma —2.95 mm; lateral 3.1 mm; 
ventral 4.35 mm), or bilaterally anterior cingulate (bregma 1.0 mm; lateral 0.3 mm; 
ventral 1.25 mm) at a slow rate (100 nl per min) using a syringe pump (Harvard 
Apparatus). The injection needle was withdrawn 5 min after the end of the infu- 
sion. Animals were tested 7 days after AAV or RBV injections. Injection sites and 
viral infectivity were confirmed in all animals post-hoc by preparing sections 
(50 jm) containing the relevant brain region (Supplementary Fig. 10). 
Immunohistochemistry. Immunohistochemistry and confocal microscopy were 
performed as described previously”. In brief, after intracardial perfusion with 4% 
paraformaldehyde in PBS (pH 7.4), the brains were post fixed overnight in this 
same solution and the following day 50 11M coronal, sagittal or horizontal sections 
were prepared. Primary antibodies were used at the following concentrations: 
mouse anti-oxytocin-neurophysin (1:50; gift of H. Gainer**””); rat anti-green fluo- 
rescent protein (GFP, 1:1000; Nacalai); rabbit anti-parvalbumin (1:750; Swant); 
rabbit anti-neuronal nitric oxide synthase (1:100; BD Transduction Laboratories); 
rabbit anti-glial fibrillary protein (1:80; Sigma-Aldrich); rabbit anti-choline acetyl- 
transferase (1:100; Millipore); rabbit anti-dopamine receptor protein (1:100, Milli- 
pore); sheep anti-tryptophan hydroxylase (1:100 Millipore) diluted in a solution 
containing 1% horse serum, 0.2% BSA and 0.5% Triton X-100 in PBS. After over- 
night incubation in primary antibody at room temperature (20-22 °C) with slow 
agitation, slices were washed four times in PBS and then incubated with appro- 
priate secondary antibody diluted at 1:750 for 2h in PBS containing 0.5% Triton 
X-100. Subsequently, slices were washed 5 times and mounted using Vectashield 
mounting medium (Vector Laboratories). To identify cells expressing GFP or 
TdTomato due to the injection of RBV-eGFP or RBV-TdTomato into the NAc, 
raw fluorescence was visualized. Image acquisition was performed with a confocal 
microscope (Zeiss LSM510) using a 10X/0.30 Plan Neofluar and a 40X/1.3 Oil DIC 
Plan Apochromat objective. Confocal images were examined using the Zeiss LSM 
Image Browser software. 
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Electrophysiology. Parasagittal slices (250 um) containing the NAc core were 
prepared from C57BL/6 and D1-TdTomato/D2-eGFP BAC transgenic mice on a 
C57BL/6 background using standard procedures. In brief, after mice were anaes- 
thetized with isoflurane and decapitated, brains were quickly removed and placed 
in ice-cold low sodium, high sucrose dissecting solution. Slices were cut by adhering 
the two sagittal hemispheres brain containing the NAc core to the stage of a Leica 
vibroslicer. Slices were allowed to recover for a minimum of 60 min ina submerged 
holding chamber (~25 °C) containing artificial cerebrospinal fluid (ACSF) con- 
sisting of 119mM NaCl, 2.5mM KCl, 2.5mM CaCl, 1.3mM MgSO,, 1mM 
NaH,PO,, 11 mM glucose and 26.2 mM NaHCO. Slices were then removed from 
the holding chamber and placed in the recording chamber where they were con- 
tinuously perfused with oxygenated (95% O2, 5% CO2) ACSF at a rate of 2 ml per 
min at 26 + 2 °C. For EPSC recordings, bicuculline (20 |1M) was added to the ACSF 
to block GABA, (y-aminobutyric acid type A)-receptor—-mediated inhibitory synaptic 
currents. For inhibitory postsynaptic current (IPSC) recordings, dl-2-amino-5- 
phosphonovalerate (dAPV, 10 uM) and 2,3-Dioxo-6-nitro-1,2,3,4-tetrahydroben- 
zo[f]quinoxaline-7-sulfonamide (NBQX, 5 tM) dissolved in DMSO were added to 
block NMDA and AMPA receptors, respectively. Whole-cell voltage-clamp recordings 
from MSNs were obtained under visual control using a 40 objective. The NAc 
core was identified by the presence of the anterior commissure. D1 and D2 MSNs 
in the NAc core were identified by the presence of TdTomato and eGFP, respect- 
ively, which were excited with ulraviolet light using bandpass filters (HQ545/30 x 
EX (excitation) for TdTomato; HQ470/40 EX for eGFP). Recordings were made 
with electrodes (3.5-6.5 MQ) filled with 115 mM CsMeSO,, 20 mM CsCl, 10 mM 
HEPES, 0.6 mM EGTA, 2.5mM MgCl, 10 mM Na-phosphocreatine, 4mM Na- 
ATP, 0.3 mM Na-GTP, and 1 mM QX-314. Excitatory and inhibitory afferents were 
stimulated with a bipolar nichrome wire electrode placed at the border between the 
NAc coreand cortex dorsal to the anterior commissure. Recordings were performed 
using a Multiclamp 700B (Molecular Devices), filtered at 2 kHz and digitized at 
10 kHz. EPSCs were evoked at a frequency of 0.1 Hz while MSNs were voltage- 
clamped at —70 mV. Data acquisition and analysis were performed on-line using 
custom Igor Pro software. Input resistance and access resistance were monitored 
continuously throughout each experiment; experiments were terminated if these 
changed by >15%. 

Summary LTD graphs were generated by averaging the peak amplitudes of 
individual EPSCs in 1-min bins (six consecutive sweeps) and normalizing these to 
the mean value of EPSCs collected during the 10 min baseline immediately before 
the LTD-induction protocol. Individual experiments were then averaged 
together. Oxytocin (Tocris Biosciences, 1 1M, 10 min) was applied via the bath 
following the collection of baseline for induction of OT-LTD. For experiments 
examining the blockade of OT-LTD, slices were pre-incubated in antagonist 
(OTR-A, 1 UM L-368,899 hydrochloride or SHTR1B-A, 20 1M NAS-181; Tocris 
Biosciences) for at least 30 min before recording. For experiments examining the 
reversal of OT-LTD, 30 to 40 min post induction, OTR-A was bath applied for 
10 min. After the collection of stable baseline EPSCs, 5HT1B-LTD was induced by 
10-min bath application of 2 4M CP-93129 dihydrochloride (Tocris Biosciences) 


as described previously”*. For experiments examining the occlusion of OT-LTD, 
after stabilization of SHT1B-LTD (at 30 to 40 min post induction), 1 [1M oxytocin 
was applied via the bath for 10 min. Miniature EPSCs were collected at a holding 
potential of —70 mV in the presence of TTX (0.5 1M). Two minutes after break-in 
(sweep number 5, 30-s sweeps), 30-s blocks of events (total of 200 events per cell) 
were acquired and analysed using Mini-analysis software (Synaptosoft) with 
threshold parameters set at 5pA amplitude and <3 ms rise time. All events 
included in the final data analysis were confirmed to be miniature EPSCs by visual 
examination, based on their rapid rise time and shape. Slices were incubated in the 
appropriate drug (dissolved in ACSF-bicuculline) for 10 min before recording, and 
cross-cell comparisons were made. Paired-pulse ratios (PPRs) were acquired by 
applying a second afferent stimulus of equal intensity, 50 ms after the first stimulus, 
and then calculating the ratio of EPSC2/EPSC1. Coefficient of variance was calcu- 
lated from the standard deviation divided by the average (STDEV/AVG) of 10-min 
blocks (minutes 0-10, pre trial; minutes 40-50, post trial). Comparisons between 
different experimental manipulations were made using a two-tailed, Students t-test 
(paired or unpaired, as appropriate) with P < 0.05 considered to be significant. All 
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RNAiscreens in mice identify physiological 
regulators of oncogenic growth 


Slobodan Beronja’, Peter Janki!, Evan Heller'!, Wen-Hui Lien’, Brice E. Keyes’, Naoki Oshimori! & Elaine Fuchs! 


Tissue growth is the multifaceted outcome of a cell’s intrinsic capabilities and its interactions with the surrounding 
environment. Decoding these complexities is essential for understanding human development and tumorigenesis. Here 
we tackle this problem by carrying out the first genome-wide RNA-interference- mediated screens in mice. Focusing on 
skin development and oncogenic (Hras®’’-induced) hyperplasia, our screens uncover previously unknown as well as 
anticipated regulators of embryonic epidermal growth. Among the top oncogenic screen hits are Mllt6 and the Wnt 
effector B-catenin, which maintain Hras°'”’-dependent hyperproliferation. We also expose B-catenin as an unanticipated 
antagonist of normal epidermal growth, functioning through Wnt-independent intercellular adhesion. Finally, we validate 
functional significance in mouse and human cancers, thereby establishing the feasibility of in vivo mammalian genome-wide 
investigations to dissect tissue development and tumorigenesis. By documenting some oncogenic growth regulators, we 
pave the way for future investigations of other hits and raise promise for unearthing new targets for cancer therapies. 


Genome-wide cellular RNA interference (RNAi) screening has advanced 
the identification of genes involved in oncogenic growth control. To 
date, however, high-throughput screens in mammalian cells have been 
limited to cultures, in which even the best systems incompletely model 
physiological environments. We have overcome this impediment by 
devising methods to efficiently and selectively transduce murine epi- 
dermis through in utero lentiviral targeting of progenitors in embry- 
onic day (E)9.5 embryos’. When coupled with short hairpin RNA 
(shRNA) expression, lentiviral transduction is stably propagated 
throughout skin epithelium, resulting in RNAi-mediated reductions 
in target transcript and protein levels. This enables rapid analysis of 
complex genetic pathways in mammals, something previously only 
possible in lower eukaryotes**. 

The correlation between a tissue’s growth and turnover rates and its 
susceptibility to cancer makes embryonic epidermis an attractive model 
for exploring how rapidly growing tissues balance proliferation and dif- 
ferentiation, and what prevents them from doing so in tumour progres- 
sion. Given the efficacy of our system in single-gene studies, we have now 
expanded this scale by more than four orders of magnitude to conduct 
genome-wide RNAiscreens. Our objectives were to first, demonstrate the 
feasibility of such screens in mammals; second, identify epidermal growth 
regulators in their native, physiological environment; third, uncover how 
epidermal growth control changes when it is propelled by a well-known 
oncogene; and fourth, demonstrate the implications of our findings for 
tumour progression in mice and humans. 


Epidermal growth is rapid and uniform 
Following completion of gastrulation and continuing to birth, mouse 
surface ectoderm commences rapid growth to match embryo expansion 
(Fig. la). Beginning as a monolayer, E9.5 ectoderm differentiates into a 
stratified, multi-layered epidermis that by birth constitutes a barrier that 
retains fluids and excludes microbes. Mature epidermis maintains an 
inner progenitor layer, which fuels tissue homeostasis and wound repair. 
To quantify epidermal growth, we randomly marked single cells at 
clonal density by infecting E9.5 Rosa26°°* *t?-*¢S!)91P Cye-reporter 
embryos (R26”"’*)§ with an LV-Cre lentivirus', and then monitored 
their expansion during development (Fig. 1b). By E18.5, single yellow 


fluorescent protein (YEP) ~ cells at E10.5 had grown to clones consti- 
tuting ~40 cells (Fig. 1b, c; ~5-6 divisions per cell). Variability in 
clone size ranged within 1-2 cell divisions, indicating strikingly uni- 
form growth throughout the epidermis. 

We next examined how growth is affected by oncogenic Hras1, which 
is found mutated in many cancers and is the primary target in skin 
carcinogenesis models*’. K14-Cre®-driven, epidermal-specific expression 
of Hras°!?Y from its endogenous locus (Hras®* lt opestop-lox-G12V\o 
resulted in mice whose skin displayed epidermal overgrowth as well as 
oncogenic Hras dose-dependency for one (K14-cre; Hrasb’-G?V/+. 
Hras™°*") or two (K14-cre; Hras’G1?2V1SE-GI2V. Piya" %) copies 
(Supplementary Fig. 1). Additional distinctions included the expansion 
of both progenitors (keratin 5") and differentiating layers (spinous 
keratin 10°; granular filaggrin®) (Supplementary Figs 1-3). 

To quantify the impact of oncogenic Hras on epidermal growth, we 
used a cellular growth index (CGI) assay’ (Fig. 1d). Cre-reporter 
(R26%/*) embryos, transduced with a LV-Cre and red fluorescent 
protein-expressing lentivirus (LV-RFP) mix showed similar relative 
numbers of YFP* to RFP” keratinocytes across several embryos, indi- 
cating that control YFP* and RFP* populations grew at comparable 
rates (Fig. le, f). Transduction of the same lentiviral mixture into test 
animals (where Cre-transduction induces Hras@!”) revealed consis- 
tently more YEP* (Hras©!2”) than RFP* (control) cells (CGI = 1.82, 
Hras°"°*!. 3.32, Hras°"°*?: Fig. le, f). These findings demonstrate 
that Hras°'7Y confers a dose-dependent growth advantage to skin 
epidermis and that growth rates can be documented and quantified. 

Hras°'? not only conferred a dose-dependent increase in prolif- 
eration, but also in suppression of apoptosis (Supplementary Fig. 4). 
In addition, consistent with established pro-inflammatory effects of 
Hras“!?’, innate and adaptive immune cells infiltrated underlying 
dermis. Last, real-time PCR revealed no evidence for oncogenic 
induction of cellular senescence-associated cyclin-dependent kinase 
(CDK) inhibitors’®. 


Establishing pooled screen parameters 


On the basis of our CGI assay principle, we expected that, after epi- 
dermal transduction with a pool of shRNA-expressing lentivirus, any 
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Figure 1 | Embryonic epidermal tissue growth is 
rapid and responsive to oncogenic Hras. 

a, Mouse embryogenesis, highlighted by 
propidium iodide (E9.5) or K14-actin-GFP 
(E12.5-18.5). b, R26//* Cre-reporter embryos 
infected at E9.5 with LV-Cre and analysed at days 
shown. Transduced cells are YFP*. Transduction 
levels (% YFP* cells) depend upon viral titre. c, Cell 
numbers in transduced YFP * clones at ages shown. 
d, Schematic of CGI assay. E9.5 R26”’* Cre- 
reporter (control) or gene lox R2GIP/* (test) 
embryos are infected with a LV-Cre and LV-RFP 
mix. At E18.5, numbers of RFP*:YEP* cells in 
control and test animals are compared, and 
phenotypes scored as neutral (CGI = 1), growth 
advantaged (CGI > 1) or disadvantaged 

(CGI < 1). e, Numbers of REP* and YEP* cells at 
E18.5 in control, Hras°"°°*! and Hras°"**? 
embryos. Upper shift is consistent with growth 
advantage. f, REP* cell numbers normalized to 
YFP* cells in control, Hras°"°*! and Hras°"°*” 
animals. CGI assay suggests a 1.8-fold overgrowth 
(P = 0.002) in Hras°"*! and 3.3-fold overgrowth 
(P <0.0001) in Hras°"*°*? epidermis. Error bars 
indicate + s.d (c) and + s.e.m (f). **(P = 0.01) in 
f indicates statistical significance of comparison to 
control. For CGI assay (e, f), data points are 
individual embryos: control (n = 9), Hras°"*! 

(n = 8) and Hras°"°? (n = 11). Scale bars, 5mm 
(a) and 50 um (b). 
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shRNA that targets an essential mediator of growth will be reduced 
or lost during development, whereas shRNAs targeting negative 
growth regulators will become overrepresented. By comparing rela- 
tive shRNA abundance in the initial pool and at E18.5, we expected to 
identify shRNAs and their targets that confer either growth advantage 
or disadvantage. 

The success of the approach depended upon our ability to: (1) modify 
growth at a low multiplicity of infection (MOI = 1); (2) measure indi- 
vidual shRNA abundance in the pool; (3) transduce embryonic epi- 
dermis at a MOI = 1; and (4) achieve complete screen coverage, in 
which every shRNA in the pool is tested. We set up a series of controls 
to ensure that these parameters were met. Underscoring the feasibility 
of pooled-formats for in vivo RNAi screens, we demonstrated that 
targeting of (1) anaphase promoting complex component Anapc5 
during normal growth, and (2) Hras1 during oncogenic hyperplasia, 
reduced average clone sizes, even with transductions where most cells 
harboured only a single shRNA (Supplementary Fig. 5). 

To quantify individual shRNA representation in a complex pool, we 
used the Ilumina-based count-by-sequencing principle (Supplemen- 
tary Fig. 6). We designed oligonucleotides to amplify the target 
sequence of each shRNA, and optimized pre-amplification and clean- 
up pipelines to yield a product to apply directly to the sequencing cell. 
We tested our protocol against a defined template generated by com- 
bining genomic DNAs from independently transduced cell lines, so that 
individual genome-integrated shRNAs were present in amounts cor- 
responding from a single cell (6 pg) up to 2,048 cells (12.3 ng). 

We amplified and sequenced this standard set, and showed that 
reactions were: (1) quantitative, with increased sequencing reads cor- 
responding to shRNA abundance in the pool; (2) sensitive, detecting 
all three single-copy shRNAs; and (3) highly reproducible (Supplemen- 
tary Fig. 6). Independent counts of the standard set showed identical 
sequencing bias for a given shRNA, and thus became neutralized in 
relative comparisons of absolute counts, especially with =32 copies of 
the shRNA. Indeed, a >30-fold screen coverage proved sufficient to 
sample all shRNAs in our pool (see below). At this level, growth-neutral 
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shRNAs were >1,000-fold represented in the E18.5 sequencing quan- 
tification reaction, because each E9.5 epidermal cell generates ~40 cells 
by E18.5. 

We next determined that at an infection level of 13-27%, most trans- 
duced epidermal keratinocytes carried a single lentivirus (MOI = 1) 
(Supplementary Fig. 7). To ensure that at least 30 individual cells were 
infected with each shRNA at E9.5, a pool of ~78,000 shRNAs required 
~10° cells to be targeted. We used high-resolution imaging of TO- 
PRO3-labelled embryos and established that at E9.5, surface ectoderm 
contained =120,000 cells per embryo. Together, this suggested that 
transducing =90 embryos would achieve the requisite coverage (Sup- 
plementary Fig. 8). 


Screens identify known growth regulators 


We pooled the genome-wide collection of murine shRNA lentiviruses 
in roughly equal concentrations”, and profiled the starting composi- 
tion of the pool (t = 0) in transduced primary mouse keratinocytes 
(Fig. 2a). Physiological screens were performed in control and 
Hras°"°** embryos transduced at E9.5 in utero (Fig. 2b). Epidermal 
cells were collected after 24 hours (initial pool) and 9 days of develop- 
ment, and integrated lentiviral hairpins from genomic DNAs were 
sequenced and quantified (Fig. 2c). 

Our pre-amplification and sequencing reactions did not bias 
shRNA quantification, and Illumina sequencing reads were of high 
quality. They mapped to the shRNA library with predictable efficiency 
and indicated complete coverage of the pool (Supplementary Figs 9, 
10). Significantly altered shRNAs were identified and ranked on the 
basis of two independent methods. To ensure reproducibility, we used 
DESegq statistical package’’, which accounted for biological variability 
among our replicates (sets of 30 embryos per condition) and is the best 
method to identify candidates per se (Fig. 2d-k). However, inform- 
ative yet variable shRNAs can be excluded by the high stringency of 
DESeq, which reduces the ability to control for off-target effects by 
requiring that multiple shRNAs show consistent behaviour. We 
therefore also analysed pooled data sets (90 embryos per condition; 
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Supplementary Fig. 11) using Fisher’s exact test, which reduces vari- 
ability by averaging individual shRNA abundance. By maximizing 
screen coverage, this method produces a more inclusive list of signifi- 
cantly altered shRNAs, and hence was preferred for ranking candi- 
dates identified by DESeq (Fig. 2f, j, k). Importantly, the overlap 
between these approaches was extensive (Supplementary Fig. 11), 
underscoring the robustness of our data. 

We identified ~ 1,800 genes as essential for normal growth (Fig. 2d, e 
and Supplementary Tables 1, 2) and significantly enriched for func- 
tion in protein synthesis (P = 3.1 Xx 10-*°) and gene expression 
(P = 2.6 X 10 '°). Genes encoding 60S and 40S ribosomal proteins were 
also highly represented and among the top 10% of all hits for normal 
growth, underscoring our screen’s power to identify regulators of normal 
growth/viability (Fig. 2f and Supplementary Fig. 12). Indeed, our top 
10 candidates for regulators of normal growth featured six ribosomal 
genes and two genes essential for messenger RNA splicing (Fig. 2)). 

In our screen for specific regulators of Hras@'*Y dependent onco- 
genic growth, ~160 genes surfaced as candidates (Fig. 2g and Sup- 
plementary Tables 3, 4). They diverged in identity and function from 
those implicated in normal growth regulation, as most housekeeping 
growth regulators were eliminated by pair-wise comparison of control 
and Hras°"°* shRNA abundance (Fig. 2h). The oncogene-specific 
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regulators included Hras1, and downstream Ras pathway members 
Rafl, and Mek and Akt proteins (Fig. 2i). Equally notable was the 
absence of upstream oncogenic Ras regulators, for example, guanine 
nucleotide exchange factors (GEFs) and GTPase-activating proteins 
(GAPs), which are not expected to arise in a screen for hairpins 
suppressing Hras@!*’-induced growth. Our top 10 hits for regulators 
of oncogenic growth included well-established Ras pathway members 
Akt3 and Raf1, as well as the Ras-regulated Pawr’ (Fig. 2k). Cyclin C 
(Ccnc), a putative regulator of haematopoietic stem cell quiescence™, 
was also on this list, as was Mllt6, encoding a myeloid/lymphoid, or 
mixed-lineage, leukaemia (MLL) translocation partner and a com- 
ponent of an epigenetic modifier complex”. 

The very top candidate for preferential regulation of oncogenic 
growth was Ctnnbl, encoding the Wnt signalling and intercellular 
adhesion effector B-catenin. At first glance, B-catenin seemed obvious, 
as its over-activation has been implicated in a variety of cancers, 
including those of skin’*'*. However, B-catenin is also thought to be 
essential for stem cells'*. Hence, it was surprising that its hairpins 
surfaced in a screen for selective inhibitors of oncogenic and not nor- 
mal growth. Even more paradoxical was that in our parallel genome- 
wide screen, Ctnnb1 was the top candidate for negative regulation of 
normal epidermal growth (Supplementary Tables 1, 2). 


Figure 2 | Genome-wide RNAi screens for 
physiological regulators of normal and 
oncogenic growth identify expected and 
surprising regulators. a—c, Schematic of the RNAi 
screens based on relative enrichment/depletion of 
individual shRNAs over time. a, shRNAs against 
15,991 mouse genes are combined into a lentiviral 
pool whose composition is determined from the 
‘initial pool’ (tf = 0) experiment, in which 
transduced cells are analysed 24h after infection. 
b, Genes that regulate normal and oncogenic 
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Figure 3 | Suppressing f-catenin and MIlt6 selectively affects Hras 
dependent epidermal hyperplasia. a, Modified CGI assay measures effect of 
shRNA-mediated gene knockdown in animals with Cre-activated transgene 
expression. Transduction with LV-Cre-RFP co-expressing scrambled (scram) 
shRNA, and LV-Cre co-expressing candidate-targeting shRNA, leads to 
generation of YFP* RFP~ scrambled and YFP* knockdown clones in control 
or Hras@!*Y animals. Numbers of YEP~ cells (normalized to YFP* RFP*) in 
control and Hras°"°? animals reflect lentiviral mix composition after normal 
and oncogenic growth, respectively. b, Fewer YFP" cells are found in oncogenic 
animals upon knockdown of Ctnnb1 and Mllt6 with independent shRNAs. 

c, Reduced EdU incorporation following Ctnnb1 knockdown in Hras@??Y 
animals contrasts with increased proliferation in control epidermis. MIlt6 
depletion also reduces EdU labelling in oncogenic growth. d, Immunoblot of 
control (Ctnnb1 heterozygous (Het)) and Ctnnb1 knockout (KO) keratinocyte 
lysates shows upregulation of plakoglobin. e, Establishment of cell adhesion 
48 hafter Ca** shift is unaffected in keratinocytes treated with a Wnt-signalling 
inhibitor (XAV939) but impaired in Ctnnb1 knockout cells. E-cadherin (green) 
marks adherens junctions and DAPI (4’,6-diamidino-2-phenylindole; blue) 
labels the nuclei. f, Unlike control cells or cells treated with a Wnt-inhibitor, 
Ctnnb1 knockout keratinocytes form overgrown foci upon reaching 
confluence. Error bars (b, c) indicate + s.e.m. Data points (b, c) represent 
individual embryos with n = 6 (shCtnnb1 and shScram in control), n = 7 
(shMllt6 no. 4,294 in control), n = 8 (shMIlt6 no. 1,271), n = 9 (shMIlt6 no. 
4,294 in Hras°!?’), or n = 10 (shScram in Hras@?"), each scored through 
immunofluorescence analysis of ten 425.1 jum” images. NS, not significant 
(P> 0.05); *P = 0.05 and **P = 0.01 indicate statistical significance. Scale 
bars, 50 um (e), 10 pm (f). 


Validating oncogene-specific regulators 


We chose Cinnb1 and MIlt6 for further study. For both Ctnnb1 and 
MIlt6, a direct correlation was observed between transcript knock- 
down in vitro and severity of growth defects in vivo (Supplementary 
Fig. 13), strongly arguing against off-target effects. We validated our 
candidates as oncogenic growth regulators with a modified in vivo 
CGI assay involving two lentiviral vectors (Fig. 3a). In one, Cre- 
recombinase fused to monomeric RFP (LV-Cre-mRFP) contained a 
scrambled shRNA control. In the other vector, untagged Cre was 
used; this vector encoded the test shRNA against the candidate. 
Transduction of E9.5 control or Hras°"*°*? Cre-reporter embryos 
marked two separate populations: RFP* YFP* cells represented the 
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baseline rate of normal (in control) or oncogenic growth (in Hras°"©°*?) 


YFP* cells represented the rate of growth that occurs when the target 
transcript is depleted. 

The ratios of YFP* cells normalized to YFP* RFP* cells in Hras°"°*? 
and control animals revealed that two independent Ctnnb1 shRNAs 
displayed reductions of ~2-4-fold in oncogenic relative to normal 
growth. Similar ~2-fold reductions in YFP” cells were observed in 
the Hras°"°** background when MIIt6 transcripts were diminished 
(Fig. 3b). The physiological effects of Ctnnb1 or Milt6 knockdown were 
profound: the neonatal oncogenic phenotype was significantly amelio- 
rated, and epidermal proliferation in Hras°"** embryos was markedly 
and reproducibly suppressed. By contrast, no significant effects were 
seen on apoptosis (Fig. 3c and Supplementary Fig. 13). 

Equally interesting to the selective effects of Ctnnb1 knockdown on 
suppressing Hras@ -dependent oncogenic growth were its positive 
effects on normal growth. These differences seemed to be physiologic- 
ally relevant, as they were reflected at the level of EdU-incorporation 
and thickness of epidermal tissue (Fig. 3c and Supplementary Fig. 14). 
Although hitherto overlooked, the proliferative effects of Ctnnb1 hair- 
pins on normal epidermis are recapitulated upon conditional target- 
ing of B-catenin*”’. 

B-Catenin is both an adherens junction component and a nuclear 
cofactor for Wnt regulators in the LEF/TCF family and other DNA- 
binding proteins’. However in contrast to its nuclear functions, 
B-catenin’s role in adhesion has been assumed to be redundant with 
plakoglobin”’. Given that intercellular defects can promote prolifera- 
tion, we revisited this issue using a sensitive in vitro adhesion assay 
(Fig. 3d-f and Supplementary Fig. 15)**. Despite plakoglobin upregu- 
lation, Ctnnb1-null keratinocytes inefficiently formed cell-cell adhe- 
sions upon calcium induction. Moreover, the Wnt inhibitor XAV939 
(ref. 25) failed to phenocopy these defects. Finally, consistent with the 
view that loss of b-catenin compromises contact inhibition and leads 
to cellular overgrowth, Ctnnb1-null cells were hyperproliferative and 
formed overgrown foci upon reaching confluence. 


Oncogenic growth and Wnt signalling 


Although intercellular adhesion is often viewed as tumour suppress- 
ive, Wnt signalling is often associated with oncogenic growth. To test 
whether this might contribute to the negative effects of Ctnnb1 knock- 
down on Hras@!2" skins, we transduced embryos with both a Wnt- 
reporter and LV-Cre (Fig. 4a). In E18.5 control animals, reporter 
expression was predictably restricted to developing hair follicles” 
and largely abolished with concomitant Ctnnb1 knockdown. 

Intriguingly, Wnt-reporter expression was expanded throughout 
transduced Hras°"? interfollicular epidermis. In addition, Hras@!?V- 
expressing epidermis displayed ectopic nuclear B-catenin and >6-fold 
upregulation of Axin2 transcripts (Fig. 4a and Supplementary Fig. 16a-c). 
Conversely, bone morphogenetic protein signalling, which is antagonistic 
to Wnt signalling in skin”, was downregulated in Hras°Y!* epidermis, 
and was not rescued by B-catenin depletion, suggesting its independence 
of Wnt in this oncogenic context (Supplementary Fig. 18). 

Mllt6 expression paralleled §-catenin and Wnt-reporter activity, both 
in normal hair follicles and in evaginating structures of Hras*"°™~ skin 
(Supplementary Fig. 16b). Chromatin immunoprecipitation followed by 
next-generation sequencing (ChIP-seq) analysis showed that Tcf3 and 
Tcf4 bound to a conserved Lef/Tcf motif upstream of MIlt6 (Fig. 4b). A 
299-base-pair (bp) segment (green line) encompassing this site drove 
LEF1/f-catenin-dependent luciferase reporter activity ina manner com- 
parable to the 331-bp Tcf3/4 binding site of Axin2, an established Wnt- 
target gene (Fig. 4c and Supplementary Fig. 16d). In agreement, deple- 
tion or loss of §-catenin in embryonic epidermis in vivo reduced MIIt6 
transcript levels (Fig. 4d). 


B-catenin and Mllit6 in epidermal tumours 


Although our screens were conducted on embryonic mouse skin, our 
findings showed relevance to cancer. RNA-seq analysis revealed that 
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Ctnnb1 or MIlt6 depletion in oncogenic Hras epidermis affected a 
shared set of transcripts (P = 2.48 X 10 *° and P=8.16 X10 7°) 
that globally suppressed pathways promoting tumorigenesis (for 
example, Myc, E2f1) and enhanced those restricting growth (for 
example, Trp53, Cdkn2a; Fig. 4e, f). Moreover, in human squamous 
cell carcinomas (SCC), B-catenin and MLLT6 were often upregulated 
and nuclear (notably in the basal layer, where cancer stem cells 
reside**). Our analysis of 75 different human skin SCCs showed that 
most tumours expressed both proteins, with significant correlation in 
their expression (Supplementary Fig. 18). It remains to be seen 
whether their co-expression in tumours and a shared effect on trans- 
criptional profile during oncogenic growth reflects a functional inter- 
action between our candidates, or is a result of their independent 
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Figure 4 | Hras@!?"-induced epidermal growth affects other signalling 


pathways. a, Wnt-reporter activity (red) is restricted to hair placodes in control 
skin (top) but extends to interfollicular epidermis in Hras°°*? animals 
(bottom). YFP (inset) marks LV-Cre-transduced epidermis. White dotted line 
demarcates dermal-epidermal boundary. b, ChIP-seq peaks on chromosome 
11 reveal Tcf3 (blue) and Tcf4 (pink) binding sites in M/lt6 and Axin2 promoter 
regions. Negative control (grey) is total genome DNA. Green bars represent the 
~300-bp fragment used to validate Tcf3/4 binding. c, Human LEF1 (TCF3/4 
family member) and stabilized $-catenin (ANB-cat) together promote 
luciferase activity when putative TCF3/4 binding sites of Mllt6é and Axin2 are 
used as drivers. These effects are not observed when the TCF3/4 binding motifs 
are mutated. d, MIlt6 and Ctnnb1 mRNA epidermal levels are reduced by 
Ctnnb1 knockdown (shCtnnb1 no. 450) or knockout (KO). e, Transcriptional 
profile of Hras°!?¥ epidermal progenitors reveals repression of tumour 
suppressors (for example, Trp53, Cdkn2a, Rb1) and activation of oncogenic 
signalling (for example, Myc, E2f1). shaRNA-mediated depletion of Ctnnb1 or 
Mllt6 in Hras“'?Y epidermis significantly counter these transcriptional 
changes. Red vertical lines represent significant activation z-score (twofold) 
and P value of a correct prediction (P = 0.01). f, Significant overlap in 
differentially regulated transcripts is observed following depletion of B-catenin 
and Mllt6 in Hras@!?Y epidermal progenitors. Error bars indicate + s.e.m. 

(c) and = s.d. (d). In the real-time PCR experiment (d), data are shown for three 
embryos assayed in two independent reactions (n = 6). *P = 0.05, **P = 0.01. 
Scale bars, 50 lum. 
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effect/importance on the cellular machinery at the heart of oncogenic 
growth. 

We next tested whether B-catenin and Mllt6 are physiologically 
relevant to Hras®!*Y-dependent tumour initiation and maintenance. 
Whereas clonal LV-Cre-mediated activation of Hras®'*Y expression 
in mice resulted in squamous papilloma formation as early as 3 weeks 
of age, concomitant constitutive expression of Ctnnb1 or MIIt6 
shRNAs delayed tumour initiation (Fig. 5a). Moreover, growth of 
orthotopically transplanted SCC cells was significantly reduced fol- 
lowing candidate depletion (Fig. 5b and Supplementary Fig. 19). 

We extended this physiological relevance by performing xenografts 
of human SCC cells transduced with lentivirus harbouring scrambled 
or shRNAs targeting human CTNNB1 or human MLLT6. Tumour 
initiation was significantly delayed, with MLLT6 showing stronger 
effect than CTNNB1 (Fig. 5c and Supplementary Fig. 19). 

Finally, to assess whether Ctnnb1 and MIlt6 are required for tumour 
maintenance, we engineered an LV-Cre vector that allows for doxycycline- 
regulated shRNA expression, thereby enabling induced depletion of 
Milt6 and Ctnnb1 following tumour formation in adult animals (Fig. 5d 
and Supplementary Fig. 20). Both had negative effects on tumour 
maintenance, with some tumours showing partial regression. Thus, 
the tumour-suppressive effects of Ctnnb1 and MIlt6 shRNAs as first 
revealed in embryogenesis appeared to be functionally relevant to adult 
tumorigenesis. 


Discussion 


The urgent need to understand cancer has fuelled human cancer 
genome sequencing and in vitro RNAi-based screening efforts to 
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Figure 5 | -catenin and Milt6 depletion impair Hras“'*¥-dependent 


tumorigenesis. a, shRNA-mediated depletion of Ctnnb1 or MIlt6 delays 
spontaneous tumour initiation in Hras°"°** mice (n = 9 in all conditions 
except LV-Scram-transduced Hras°"°*? (n = 18)). Control lines correspond 
to animals transduced with shRNAs with no effect on tumorigenesis. 

b, Tumour volumes of Ctnnb1- and Mllt6-depleted mouse SCCs transplants are 
significantly reduced after 30 days of growth. c, Tumour initiation following 
xenotransplantation of shRNA-transduced human SCC cells is significantly 
delayed following knockdown of human CTNNB1 or MLLTO. d, Induction of 
Ctnnb1 or Mllt6 knockdown in pre-existing spontaneous mouse papillomas 
results in impaired growth and sometimes regression. a-d, Transduction of 
scrambled shRNA served as control. Error bars (b, d) indicate s.e.m. 

b-d, n = 12 transplants (b, c) or tumours (d). *P = 0.05 and **P = 0.01 
indicate statistical significance of the observed differences. 
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identify genes that preferentially affect cancer cells but not their nor- 
mal counterparts. Although promising in concept, assay conditions 
and cell-line histories can profoundly affect genes identified in these 
screens””°, Although xenogeneic transplantations of transduced 
human cells offer improvements”, they often incompletely simulate 
carcinoma ontogeny, which depends upon complex interactions with 
local and systemic environments. By targeting cells in their normal 
physiological context, we correct these deficits and abrogate many 
caveats, including epigenetic, genetic and stress-induced alterations 
in gene expression, all of which introduce heterogeneity and increase 
coverage requirements when cells are grown on plastic. 

Our study accentuates a particular importance of -catenin in pro- 
moting oncogenic effects, as it surfaced at the top of >15,000 genes in 
our screen. Moreover, increased Hras—mitogen- activated protein kinase 
signalling drove B-catenin’s effects from negative to positive, as normal 
epidermal growth was actually impeded by B-catenin. In this regard, it is 
interesting that leukaemias also seem to be more sensitive to activated 
B-catenin than their normal counterparts”. Our findings further suggest 
that B-catenin’s ability to balance tissue growth is exerted through its 
antagonistic functions in intercellular adhesion and transcriptional 
activation. 

A myriad of new candidates from our screen await further investi- 
gation. Among them are chromatin modifiers, which have been 
increasingly implicated in human cancers”’. In this regard, our valid- 
ation of MIlt6 is intriguing, as MLL proteins are known to associate 
with DOTIL H3K79-methyltransferase complexes**. Given Mllt6’s 
selective effects on oncogenic growth, it is tempting to speculate that 
this protein might function by guiding its histone modifier complex to 
a key cancer target gene(s). Although detailed understanding of this 
and other candidates awaits experimentation, our methodology paves 
the way for future studies aimed at uncovering mechanisms of SCC 
progression, with the hope of identifying targets that selectively com- 
promise growth of one of the world’s most prevalent and life-threatening 
cancers. 


METHODS SUMMARY 


Animals were on a C57BL/6 background. Lentiviral production and ultrasound- 
guided injection into E9.5 amniotic space are as described’**. Transduced embryos 
were developed to E18.5, after which epidermal suspensions were prepared for 
gDNA isolation. gDNAs from sets of 30 transduced embryos were combined and 
used as template for a 21-cycle pre-amplification PCR. For identification and 
quantification of shRNAs, clean pre-amplification product was sequenced using 
Illumina HiSeq2000, and the sequencing output was aligned to the TRC 2.x library 
with Burrows-Wheeler Aligner (BWA) with a maximum edit distance of three. 
Bioinformatics analyses of RNA-seq data and candidates identified by our screens 
were performed using IPA software (Ingenuity Systems). Figures were prepared 
using Adobe Photoshop and Illustrator CS5. Graphing and statistical analyses were 
performed using Prism 5 (GraphPad Software). Descriptions of antibodies and 
mouse strains are provided in Methods. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Lentivirus production and in vivo and in vitro transductions. Large-scale pro- 
duction and concentration of lentivirus were performed as previously described’. 
Male and female animals was used in equal numbers, and all mice were on the 
C57BL/6 background, including Gt(Rosa 1) 26801!" @YFP)Cs* (Jackson Laboratories, 
donated by A. McMahon), FR-Hras@?" (9) and Tg(K14-cre)1Efu’. Mice were 
housed and cared for inan AAALAC-accredited facility, and all animal experiments 
were conducted in accordance with IACUC-approved protocols. Randomization 
and blinding were not used in this study. Detailed description of the in vivo lentiviral 
transductions can be found elsewhere’”’. For lentiviral infections in culture, cells 
were plated in 12-well dishes at 70,000 cells per well and incubated with lentivirus in 
the presence of polybrene (100 mg ml‘) overnight. After 2 days, infected cells were 
sorted on the basis of RFP expression (mouse and human SCC cells) or positively 
selected with puromycin (1 mg ml) for 4 days and processed for mRNA analysis. 
mRNA quantifications. Total RNAs were isolated from FACS-sorted cells from 
E18.5 epidermis or from flash-frozen, pulverized kidney, using the Absolutely RNA 
Microprep kit (Stratagene). Complementary DNAs were generated from 1 jg of 
total RNA using the SuperScript Vilo cDNA synthesis kit (Life Technologies). Real- 
time PCR was performed using the 7900HT Fast Real-Time PCR System (Applied 
Biosystems) and gene-specific and Ppib control primers. Real-time experiments 
were done on cells isolated from three transduced animals, or three independently 
transduced cell culture plates, and all reactions were performed in triplicate and in 
two separate runs. BRE-ZsGreen activity was measured using real-time PCR with 
ZsGreen-specific primers on cDNA from transduced epidermal cells as previously 
described’. 

Immunostaining and histological analyses. The following primary antibodies 
were used: chicken anti-GFP (1:2,000; Abcam); mouse anti-B-catenin (15B8, 
1:1,000; Sigma); guinea pig anti-K5 (1:500; E. Fuchs); rat anti-CD34 (RAM34, 
1:100; eBioscience), anti-Ecad (ECCD-1, 1:200; M. Takeichi) and anti-nidogen 
(ELM1, 1:2,000; Santa Cruz); rabbit anti-caspase 3 (AF835, 1:1,000; R&D), anti- 
REP (PM005, 1:2,000; MBL), anti-K10 (PRB-159P, 1:1,000; Covance), anti-filaggrin 
(PRB-417P, 1:2,000; Covance), anti-pSmad1/5/8 (AB3848, 1:1,000; Millipore) and 
anti-Mllt6 (NBP1-89222, 1:100; Novus Biologicals). Secondary antibodies were 
conjugated to Alexa-488, 546 or 647 (1:1,000, Life Technologies). Detection of 
pSmad1/5/8 was enhanced using the Tyramide Signal Amplification (Perkin 
Elmer). Cells and tissues were processed as previously reported’, and mounted 
in ProLong Gold with DAPI (Life Technologies). Skin squamous cell cancer tissue 
array (SK802a) was obtained from US Biomax Inc. Immunohistochemistry pre- 
parations were developed using ImmPRESS Universal Antibody Polymer 
Detection method (Vector Laboratories). Confocal images were captured by a 
scanning laser confocal microscope (LSM510 and LSM780; Carl Zeiss) using 
Plan-Apochromat 20X/0.8 oil and C-Apochromat 40X/1.2 water lenses. Images 
were processed using ImageJ and Adobe Photoshop CS3. To quantify the number 
of ectodermal cells at E9.5, embryos were fixed in 4% paraformaldehyde, permea- 
bilized in PBS + 0.1% Triton (Sigma), and the nuclei were labelled with TO-PRO-3 
as recommended (Life Technologies). Tiled Z-stack images of were collected on a 
Zeiss LSM780 using a Plan-Apochromat 63%/1.4 oil lens. Stacks and metadata 
were imported into MATLAB (Mathworks) using the LOCI Bio-Formats 
Importer**. For each stack, the surface was located by finding the first Z position 
with an average intensity threefold above background, and the stack was cropped 
to 6 um corresponding to the surface epithelium. The resulting images were seg- 
mented in three dimensions using Imaris (Bitplane AG) to obtain counts of nuclei. 
Flow cytometry. Primary epidermal keratinocytes were isolated’ and then puri- 
fied by fluorescence activated cell sorting (FACS) using BD FACSAria II (BD 
Biosciences). Nucleotide analogue EdU (50 mg per g body weight) was injected 
intraperitoneally 2 h before processing, and EdU (Life Technologies) incorpora- 
tion and active caspase 3 (BD Pharmingen) assays were performed as recom- 
mended. Immune cell infiltration was analysed in whole skin dissociated with a 
sequential incubation in collagenase (Sigma; 0.25% in HBSS for 90 min) and 
trypsin (Gibco; 0.25% in PBS for 15 min) at 37 °C. The following biotin-conjugated 
rat antibodies (1:100; Pharmingen) were used: anti-CD11b (M1/70), anti-CD103 
(M290), anti-Ly-6G/C (RB6-8C5), anti-CD3e (145-2C11), anti-CD45 (30-F11) 
and anti-CD45R (RA3-6B2). YFP/RFP quantification was based on detection of 
the native protein in unfixed cells. Flow cytometric analysis was performed on BD 
LSR IL. 

Cell culture assays. Cells were cultured in 0.05 mM Ca”* (E18.5 mouse epi- 
dermal keratinocytes and SCC cells) or 1.5 mM Ca?* (human SCC) E-media 
supplemented with 15% serum. Cell adhesion in primary epidermal keratinocytes 
seeded at low confluence was assayed by replacing their growth medium with a 
1.5mM Ca?* E-media, and fixing them at different times thereafter. Nucleotide 
analogue EdU (10 LM) was added to cell culture media 90 min before processing, 
and contact inhibition was analysed in cells 3 days after reaching confluence. 
Inhibition of Wnt signalling was achieved by addition of 5 1M tankyrase inhibitor 
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XAV939 (ICs9 values 11 nM (Tnks1) and 4nM (Tnks2)) to the media 12 h before 
the start of the experiments. 

Lentiviral constructs. Sequences of RNAi constructs are listed in Supplementary 
Table 5. Design of LV-RFP, LV-GFP and LV-Cre has been previously reported’. 
Lentiviral construct for inducible shRNA expression is a modification of tet- 
pLKO-puro* (Addgene plasmid 21915), where the IRES-Puro cassette was 
replaced between the Xmal and KpnI sites with the ligation of PCR-amplified 
XmalI/Nhel-flanked P2A fragment and Nhel/KpnI-flanked nlsCre cDNA. The 
lentiviral Wnt-reporter was fashioned after the lentiviral Beta-catenin Activated 
Reporter®. It includes 12 Tcf/Lef binding sites followed by a minimal TK pro- 
moter and an mRFP1 transgene that were subcloned into a pLKO.1 backbone 
between KpnI and Nhel sites. Lentiviral bone morphogenetic protein-reporter 
that contains a pair of bone morphogenetic protein response elements is a deriv- 
ative of BRE-ZsGreen*, where the reporter cassette between Xhol and Nhel sites 
has been placed between SalI and Nhel sites of the pLKO-nlsCre-MCS vector. 
Tumour-free survival. Control and Hras°"? animals were transduced at E9.5 
with low-titre LV-Cre containing constitutively expressing or inducible shRNA 
against scrambled control or test Ctnnb1 and Mllt6 shRNAs. Transductions were 
confirmed by real-time PCR of P7 (newborn) littermates, and the remaining 
animals were monitored for an additional 12 weeks. Animals were assessed every 
2-3 days, and scored positive when tumours were larger than 2 mm in diameter. 
Animals transduced with an LV-Cre containing inducible shRNAs were allowed 
to form tumours for 60 days, at which point individual tumours were measured 
along their short and long axis using a digital caliper (t= 0). Next, tumour- 
bearing animals were treated by a single intraperitoneal injection of doxycycline 
(100 pl of 50mgml~') and maintained on doxycycline-containing chow for 
8 weeks, and tumour size was assayed every 7 days. Because the tumour volumes 
at t = 0 showed a range between 4-20 mm”, the assayed tumour size was normal- 
ized to the initial tumour volume, and expressed as fold-change over time. 
Transplantation of SCC cells transduced with control shRNA, or shRNAs target- 
ing mouse and human Ctnnb1 and Milt6, into immunocompromised nude reci- 
pients were performed as previously described’*, and animals were monitored 
every 3 days for a month. Tumour size was measured using a digital caliper, and 
tumour volume was calculated using the formula [(length x width)* x 1]/6. 
Tcf3/4 ChIP-seq and luciferase assay. Details of the Tcf3/4 ChIP-seq will be 
reported elsewhere. For luciferase assays, passage 9-14 293FT cells were seeded in 
96-well culture plates and transfected at 60-70% confluence using standard cal- 
cium phosphate procedures. Cells were co-transfected with control Renilla pRL- 
TK, and combinations of 50 ng pGL3-Mllt6, Mllt6™""*"", Axin2 or Axin2™"""" 
and 200 ng of K14-expression vectors encoding Lefl, ANf-cat or control (empty 
vector). After 44 h, cells were collected and luciferase activity was measured using 
the TD-20/20 luminometer (Turner Biosystems) and the Dual-Glo Luciferase 
Assay System (Promega). Each transfection was performed in duplicate and 
repeated seven times. 

Sample preparation and pre-amplification. Epidermal cells were isolated from 
E18.5 mouse skin using previously established procedures”. Cells from individual 
embryos were used for genomic DNA isolation with the DNeasy Blood & Tissue 
Kit (Qiagen), and each sample was analysed for target transduction using real- 
time PCR. gDNAs from 30 transduced embryos were pooled, and 200 jig of the 
total was used as template in a 10-ml pre-amplification reaction with 21 cycles 
and Phusion High-Fidelity DNA Polymerase (NEB). PCR products were run on a 
2% agarose gel, and a clean ~200-bp band was isolated using QIAquick Gel 
Extraction Kit as recommended by the manufacturer (Qiagen). Final samples 
were then sent for IIlumina HiSeq 2000 sequencing. 

Sequence processing and relative shRNA quantification. For each genotype, 
DNA from 30 embryos was pooled and independently sequenced using custom 
forward (5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC 
ACGACGCTCTTCCGATCTATATCTTGTGGAAAGGACGAAACACC-3’) and 
reverse (5'-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTAATTGTGG 
ATGAATACTGCCATTTGTC-3’) oligonucleotides. Ilumina reads were trimmed 
to the 21-nucleotide hairpin sequence using the FASTX-Toolkit and aligned to the 
TRC 2.x library with BWA (v0.6.2) using a maximum edit distance of 3. Putative 
growth regulators were identified by combining two methodologies. First, Illumina 
reads from 3 sets of 30 embryos were treated as independent biological replicates in 
the DESeq R package’*. Dispersions (variability) for each hairpin were estimated 
using a local fit to the data for each genotype, and hairpins with a P value <0.05 by 
the negative binomial test were considered for downstream analysis. Second, an 
analysis was carried out on a pooled data set in which the reads from three sets of 
embryos were combined to maximize screen coverage and average biological vari- 
ability. Although this precludes estimation of within-group variability, it has the effect 
of reducing noise for poorly counted hairpins when operating close to the minimum 
required screen coverage. Fisher’s exact test was applied on a per-hairpin basis using 
combined reads by assembling a 2 X 2 contingency table*’. The columns of the table 
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are the treatment conditions (for example, control and Hras ), and the rows 
correspond to the sequencing counts for a given hairpin in the first, and the counts for 
all other hairpins in the pool in the second. The test thus calculates the probability of 
observing a difference in hairpin representation relative to the expected repres- 
entation in the pool. We further adjust the P value for multiple testing using the 
Benjamini-Hochberg correction. Hairpins with a P value <0.05 were considered 
for further analysis. A gene was considered significantly enriched or depleted if at 
least two hairpins exhibited a twofold or greater change in normalized reads with a 
significant P value, and no hairpins in the set exhibited a change of equal mag- 
nitude in the opposite direction. Hits common to both analyses were ranked by 
number of significant hairpins and the magnitude of their effect. All analyses were 
carried out in the R statistical enviroment*, with some plots produced using the 
ggplot2 package’. Gene lists were imported into the Ingenuity Pathway Analysis 
software (Ingenuity Systems), and analyses and graphic outputs of relative enrich- 
ment in functional gene categories were performed as recommended. 

Choice of statistical analysis of relative shRNA abundance. The strength of the 
Fisher’s exact test is that it can calculate a probability of observing a difference in 
shRNA representation in a comparison of pooled data sets. When operating close 
to the minimum required screen coverage or when it is not feasible to perform 
many independent replicates, this strategy can be advantageous to increase cover- 
age and reduce noise if combined with additional stringent criteria (that is, 
requiring a gene to be targeted by multiple hairpins) and validation. 

Because this methodology does not explicitly account for sample-to-sample 
variability (instead maximizing coverage and averaging out variability), we inde- 
pendently analysed our data using two additional statistical methodologies that 
directly address variability within biological replicates. Importantly, both of these 
methods used the same stringent set of thresholds (twofold change in hairpin 
count, and a requirement for least two hairpins to show a significant effect in same 
direction, and none in the opposite). First, we used DESeq’’, an R package 
designed for the analysis of Illumina sequencing-based assays, which estimates 
and accounts for biological variability in a statistical test based on the negative 
binomial distribution. Second, we treated independently sequenced sets of 30 
embryos as biological replicates, and generated replicate-specific lists of candidate 
genes. Comparison of hits shared between these replicates to the hits identified in our 
analysis of pooled samples yielded a highly conserved set of candidate genes con- 
sistent with strong reproducibility of our data. Both analyses identify a list of candi- 
dates that substantially overlaps with those identified by our pooling and ranking 
scheme, with nearly all of our top hits identified regardless of the methodology. 

When conducting shRNA drop-out screens, perhaps the most important cri- 
teria in identifying potential candidates is that a gene be targeted by multiple, 
independent hairpins to avoid off-target effects. We thus felt our data would be 
best-served by combining an analysis of pooled data, which tends to be more 
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inclusive at the level of hairpins and enables ranking by number of independent 
hairpins, and the results of DESeq, which ensures reproducibility of hits. 
RNA-seq and IPA network analyses. Epidermal progenitors were FACS sorted 
into TrizolLS (Invitrogen) and RNA was purified using Direct-zol RNA MiniPrep 
kit (Zymo Research) per manufacturer’s instructions. Quality of the RNA was 
determined using Agilent 2100 Bioanalyzer, with all samples passing the quality 
threshold of RNA integrity numbers (RIN) > 8. Library preparation using Illumina 
TrueSeq mRNA sample preparation kit was performed at the Weill Cornell Medical 
College Genomic Core facility, and cDNA was sequenced on Illumina HiSeq 2000. 
Reads were mapped to mm9 build of the mouse genome using TopHat, and trans- 
cript assembly and differential expression were determined using Cufflinks”. 
Differentially regulated transcripts were analysed in IPA (Ingenuity Systems), 
and the upstream transcriptional regulators were predicted using the Upstream 
Regulator Analysis package, with a significant overlap between the data set genes 
and transcription factor targets set at P< 0.01, and the regulation direction (acti- 
vated or inhibited) at z-score = 2. 

Statistics. All quantitative data were collected from experiments performed in at 
least triplicate, and expressed as mean ~ s.d. ors.e.m. The fits of cellular and tumour 
growth were compared using the extra sum-of-squares F-test, and expression of 
CTNNBI and MLLT6 in human SCC tissue was analysed using a non-parametric 
(Spearman) correlation. Differences between groups were assayed using two-tailed 
student t-test using Prism 5 (GraphPad Software). Significant differences were 
considered when P < 0.05. 
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Stimulated X-ray emission for materials science 


M. Beyel, S. Schreck”, F. Sorgenfrei', C. Trabant!**, N. Pontius', C. Schii®ler-Langeheine', W. Wurth? & A. Féhlisch!? 


Resonant inelastic X-ray scattering and X-ray emission spectros- 
copy can be used to probe the energy and dispersion of the elemen- 
tary low-energy excitations that govern functionality in matter: 
vibronic, charge, spin and orbital excitations’. A key drawback 
of resonant inelastic X-ray scattering has been the need for high 
photon densities to compensate for fluorescence yields of less than 
a per cent for soft X-rays*. Sample damage from the dominant non- 
radiative decays thus limits the materials to which such techniques 
can be applied and the spectral resolution that can be obtained. A 
means of improving the yield is therefore highly desirable. Here we 
demonstrate stimulated X-ray emission for crystalline silicon at pho- 
ton densities that are easily achievable with free-electron lasers’. The 
stimulated radiative decay of core excited species at the expense of 
non-radiative processes reduces sample damage and permits narrow- 
bandwidth detection in the directed beam of stimulated radiation. 
We deduce how stimulated X-ray emission can be enhanced by seve- 
ral orders of magnitude to provide, with high yield and reduced sam- 
ple damage, a superior probe for low-energy excitations and their 
dispersion in matter. This is the first step to bringing nonlinear X- 
ray physics in the condensed phase from theory'®’ to application. 

In the soft X-ray region, the use of nonlinear techniques to enhance 
the signal levels has been prevented by the small cross-sections and the 
short lifetimes of core-excited states in the regime of a few femtose- 
conds. In the past few years, free-electron lasers have become available, 
producing ultrashort, intense soft X-ray pulses”’”°. Recently, the stimu- 
lation of emission from a single fluorescence line ina rare gas”! and hard 
X-ray/optical sum frequency generation” have been demonstrated. 

We present here stimulated X-ray emission from a solid-state sam- 
ple recorded at the free-electron laser in Hamburg (FLASH) for non- 
resonant silicon L-edge excitation at a photon energy of 115 eV. With 
free-electron laser radiation we produce regions with high 2p core 
excitation densities. The spontaneously emitted radiation from recom- 
bination of the 2p core holes (photon energy of 85 eV to nearly 100 eV) 
seeds the stimulated emission of soft X-ray photons. The emitted spec- 
trum is determined by the spontaneous emission as observed in a 
typical resonant inelastic X-ray scattering (RIXS) or X-ray emission 
spectroscopy (XES) experiment and thus conserves all the information 
and specificity of these methods. By carefully choosing the geometry, 
we can significantly enhance the weak fluorescence signal at the expense 
of Auger decays. Fewer electrons are emitted and electronic damage to 
the sample is minimized. By properly shaping the free-electron laser 
beam footprint on the sample the detected signal can be enhanced by 
orders of magnitude because the usually isotropic emission can be direc- 
ted towards the detector. This opens up the possibility of nonlinear 
spectroscopy in the X-ray region by combining concepts from non- 
linear optics with the high information depth of X-ray spectroscopy, 
which is intrinsically able to resolve femtosecond dynamics’? "****, 

The absorption and stimulated emission probabilities P can be appro- 
ximated for infinitesimally thin samples where the photons interact on 
a short length dx as: 


P=OPytomdx and P=opg,dx (1) 


where Patom is the number density of absorbing atoms and p.,, (for core 
hole) is the number density of excited atoms for stimulation. The X-ray 
absorption or stimulation cross-sections o are determined by the dipole 
transition matrix elements between the core level and unoccupied 
(absorption) or occupied (stimulated emission) valence states and 
are assumed at this point to be approximately the same. The cross- 
section is connected with the absorption length 1 = (GPatom) -- The 
effect of stimulated emission is large, when the stimulation probability 


approaches unity P i= OP Ax, yielding: 
c] 2 
Poh — ( 2) 


Patom dx 
For effective stimulation, the absorption length relative to the inter- 
action lengths of stimulating photons has to be similar to the fraction 
of core-excited atoms. For comparable interaction and absorption leng- 
ths, every atom needs to be core-excited; a population inversion is required. 

In the forward direction, stimulated X-ray emission has been demon- 
strated in the gas phase”’. In this geometry, the maximum interaction 
length at which stimulating photons can interact with core holes is 
intrinsically given by the absorption length. The sample has to absorb 
one X-ray photon per atom. If applied to solids, this energy usually 
destroys the bonding network and is hence not suited for spectroscopic 
studies of the undisturbed system. 

The situation is very different for other geometries. Along the lateral 
dimension of the X-ray focus, the interaction length can be signifi- 
cantly larger. With soft X-rays on solids, the absorption length of the 
exciting radiation with a photon energy that is slightly above an absorp- 
tion edge is typically tens or hundreds of nanometres, whereas the 
lateral dimensions of typical X-ray foci are a few to hundreds of micro- 
metres. The interaction length is then limited by the absorption length 
of the RIXS and XES signals with a photon energy below the absorp- 
tion edge, which are typically an order of magnitude larger than for the 
exciting radiation. Consequently, lower excitation densities are needed, 
when the observation direction is chosen to be along the lateral dimen- 
sion of the projected focus instead. This situation is sketched in Fig. 1, 
together with measured data of the total emission signal detected at 
several angles to the surface. We observe an emission maximum at 
around 9°, which is the result of the balance between the absorption 
length of the exciting radiation and the interaction length of the emit- 
ted radiation inside the excited volume. 

Wealso studied the emission with a spectrometer. Experimental con- 
straints meant that it was placed at around 15° to the sample surface. 
We observed a characteristic dependence of the total number and the 
spectral distribution of the outgoing photons on the incoming photon 
flux (Figs 2 and 3). 

Before we analyse the experimental data, we discuss the relevant pro- 
cesses that have to be considered to derive the number of stimulated 
photons. The explicit formulation is given in the Methods section. 
Stimulated emission can be treated along the same lines as absorption. 
We can hence formulate a differential equation as a function of the inter- 
action length in an ansatz similar to the derivation of the Lambert-Beer 
law for absorption. This approach is inherently time-independent. Auger 
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Figure 1 | Geometry to observe spontaneously stimulated X-ray emission 
from solids. a, X-rays create core excitations in the solid (yellow). A cascade of 
stimulated emission builds up in a direction where the penetration depth 4 of 
incoming photons is balanced by the absorption length L for emission. b, The 
total emission is detected as a function of glancing angle at fluences where 
stimulated emission saturates. An enhancement is observed for shallow angles, 
where the interaction length for emitted photons is longest. This direction is far 
away from the specular reflection increase around 45°. The inset displays the 
layout of the experiment. APD, avalanche photodiode. 


decays are not explicitly treated, because Auger processes and radia- 
tive emission are independent processes and are only related through 
the number of core holes. Auger decays are implicitly included, though, 
through the core-hole life time. Auger decays are the only other signi- 
ficant decay channel of the core holes and stimulated emission reduces 
the number of core holes, so we expect a substantial decrease in the num- 
ber of Auger processes. The effective lifetime of the core holes, taking 
all decay channels into account, should be shortened. 

We formulate a differential equation for the gain in the number of 
observed stimulated photons. Ngim,obs is the number of stimulated 
photons that are emitted into a detector with a finite acceptance angle 
and detection efficiency: 


ANstim,obs 
i = NobsPchOstim (3) 


where the cross-section for transitions between the respective valence 
and the core state is Ostim (now generally different from o4p;). The total 
number of photons in the observation direction N,,, drives the obser- 
ved stimulation and is the sum of the number of spontaneously emit- 
ted photons in this direction Ng,,ps (initially acting as a seed) and the 
increasing contribution of Netim,ops itself. Without stimulation, N,,,obs 
is proportional to the number of the incoming photons. Stimulated 
photons are available to stimulate further, which leads to a nonlinear 
increase of the stimulated signal. At low photon numbers stimulation 
is negligible and the spontaneous emission forms a linear lower limit 
for the dependence of the observed number of photons as a function of 
incoming photon numbers. 

The next important factor is the instantaneous density of the core 
holes available for stimulated decays. Although every photon in an X-ray 
pulse is absorbed in the sample, the instantaneous number of core 
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Figure 2 | Observing stimulated emission from a solid. a, Red symbols show 
the measured number of counts per shot versus the number of incoming 
photons. The error bars are one standard deviation of the averaged values. The 
grey area falls between the limiting cases of spontaneous emission and 
saturation of stimulated emission. The green line is a fit to the derived formula. 
The inset shows the geometry of the experiment. b, The same data are plotted as 
detected conversion, after dividing the detector counts by the number of 
incoming photons. The linear limits now become constants. 


holes is smaller than one per incoming photon, depending on the pulse 
length and the core-hole lifetime: some core holes have already decayed 
before the pulse is over. Besides the reduction through spontaneous 
decay, the number of core holes is reduced by each stimulated photon. 
To treat this effect, we split the decay rate of the core holes into two 
parts: the Auger decays enter through the fixed core-hole lifetime, and 
the stimulation directly reduces the number of core holes. Thus, the 
change in the number of stimulated photons depends on Ntim,obs itself, 
this time with a negative sign leading to saturation. An upper limit for 
stimulated emission is reached, at which every core hole is stimulated. 
For high incoming photon numbers, the observed number of photons 
will thus saturate towards an upper linear limit as a function of incom- 
ing photon numbers. 

The solution of the differential equation provides the dependence of 
the observed number of stimulated photons on the experimental para- 
meters. We approximate the lateral distribution of the incoming beam 
and the exponentially decaying absorption profile into the depth of the 
sample with a constant core-hole density (albeit modified through 
stimulated emission), inside a cylindrically shaped excitation volume, 
the size of which is given by the focal sizes and the absorption length of 
the incoming radiation. The interaction length for stimulation in this 
volume extends from the bottom of the excited cylinder to the sample 
surface in the direction of observation and is thus given only through 
the geometry of the experiment. 

For soft X-rays from current free-electron lasers, focal sizes of the 
order of 10 jtm and pulse lengths around the core-hole lifetime are well 
within reach. With typical absorption lengths in solids for the incom- 
ing photons of around 100nm and for the emission of around 1 um 
and with stimulation cross-sections of 1 Mbarn (ref. 26), stimulation 
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Figure 3 | Spectrally resolved stimulated emission. a, The detected 
conversion is shown, highlighting the integration regions for data plotted in 
b. b, The maximum detected conversions for different spectral regions are 
marked with arrows. The most intense features saturate first (green, 87-95 eV), 
followed by the emission shoulder (blue, 95-99.5 eV) and the weak multiple 
scattering background (black, 71-84 eV). The onset of saturation can be 
connected with the stimulation cross-section. Emission above the band gap 
(red, 99.5-108 eV) is not observed at low intensities. The signal is connected 
with X-ray-induced electronic excitations. 


becomes important at about 10"* incoming photons per pulse. Typical 
soft X-ray free-electron lasers produce up to 10° photons per shot, so 
that stimulated emission has to be considered as a dominating effect. 

We now turn to the experimental observations. In Fig. 1b, we show 
the angular dependence of the emission signal. The incidence angle is 
chosen to be 45° and we find enhanced intensity for the specular geo- 
metry with the detector at 45°. Additionally, we find a strong enhance- 
ment of the signal by about a factor of five at grazing angles around 9°, 
far from the specular reflection. It appears in a narrow angular window 
and is indeed the result of a nonlinear effect. With the used excitation 
conditions, we are deep in the regime where the stimulated emission 
saturates and the observed effects are variations in the angular distri- 
bution of the stimulated emission. The maximum signal is expected in 
a direction for which the interaction length inside the core excited volume 
is as large as possible and for which the reabsorption probability is 
minimized. Because the absorption length for the incoming radiation 
is about an order of magnitude shorter than the reabsorption length, 
the optimal angle is expected around arcsin(0.1) ~ 6° which is in 
reasonably good agreement with the experiment. 

In Fig. 2a, we show the total number of observed photons in our spec- 
trometer placed at 15° to the surface plane, along with a fit to the 
expected dependence. We also include the linear limits derived from 
the fit. We observe that already in the lower fluence range studied, there 
is a deviation from linearity and the stimulated emission quickly satu- 
rates to the upper limit. In the chosen experimental geometry, the enhan- 
cement of the emission between purely spontaneous and saturated 
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stimulation is only about a factor of two. This is because the majority 
of the stimulated photons are radiated more grazing to the surface and 
do not reach our detector, as shown in Fig. 1b. Still, the nonlinear enhan- 
cement and saturation of the signal through stimulated emission are 
clearly observed. 

It is instructive to display the total number of observed photons divi- 
ded by the number of incoming photons (see Fig. 2b). This yields the 
conversion efficiency from incoming photons to emitted photons, as 
seen by the detector—the detected conversion. The limiting linear 
dependences turn into limiting constant conversion efficiencies and 
are determined by the experimental parameters. The nonlinearity of 
the curve becomes obvious as a monotonous change from the lower to 
the upper limit. 

The photon number needed for the nonlinear part between the 
limiting lines is directly related also to the emission-energy-dependent 
stimulation cross-section. For larger stimulation cross-sections, the 
nonlinearity and thus the saturation of stimulated emission will occur 
at smaller incoming photon numbers. In Fig. 3a, we show the spectral 
evolution of the silicon L-edge emission depending on the number of 
incoming photons and how we separate the spectrum into parts of 
different emission intensity. The stimulation cross-section contains 
the same matrix elements as the spontaneous emission probability, 
which is in turn proportional to the measured spectral emission intens- 
ity, so the stimulation of emission becomes more effective for emis- 
sion-energy regions that already show a high intensity at low fluences. 
Figure 3b displays the dependence of the normalized detected conver- 
sion on the number of incoming photons in specific emission-energy 
regions. The peak region around an emission energy of 90 eV, as the 
most intense feature, approaches saturated stimulation fastest, and the 
other regions follow in the order of spectral intensity. 

The emission energy region for usually unoccupied states above the 
bandgap shows a strikingly different behaviour. Here, a secondary effect 
sets in. Photo-emitted electrons and electrons from Auger decays scat- 
ter inelastically in the sample, very quickly creating a multitude of delo- 
calized electron-hole pair excitations around the bandgap. This effect 
becomes stronger with an increasing number of incoming photons 
and the connected radiative decays annihilate core holes that contrib- 
uted to the stimulated emission of other parts of the spectrum at lower 
photon numbers. Therefore, the detected conversion in other parts of 
the spectrum actually shrinks rather than becoming constant. Never- 
theless, because all emission processes saturate when all the core holes 
are stimulated down, the total detected conversion still becomes con- 
stant (compare the integral detected conversion shown in Fig. 2b). 

In the direction of 15° from the surface, the detected conversion 
increases by a bit more than a factor of two between the limiting cases 
of spontaneous emission and saturated stimulated emission. By using a 
shallower angle, we can increase the gain further by another factor of 
five (see the difference in the saturated emission signal in Fig. 1b). This 
value is limited by the round shape of the free-electron laser beam foot- 
print on the sample. The shape of the beam leads to the same maximi- 
zed interaction length in every azimuthal direction in the sample surface 
plane and thus to a stimulated emission profile which is rotationally 
symmetric around the sample normal at the centre of the beam. 

The observed gain can clearly be further increased when the foot- 
print of the irradiated photons is specifically shaped. An elongation of 
the footprint in the observation direction will further enhance the obser- 
ved photon numbers, whereas in other directions the signal will be 
reduced. To ensure temporal overlap of the core excitations in the sam- 
ple and the majority of the stimulated photons travelling with the speed 
of light, the footprint of the exciting beam needs to be elongated by 
placing the sample grazing to the incident beam. In this way, the exciting 
wavefront will travel along the sample surface in time with the stimu- 
lated photons and the enhancement of the emission signal will be 
maximized while X-ray-induced sample damage is further minimized. 

The same applies to soft X-ray experiments from other systems, 
because the determining parameters are very similar for most materials. 
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The signal gain that we expect as compared to normal RIXS or XES 
experiments stems from two sources. On one hand, the saturation of 
stimulated emission is an indicator that most core holes decay via the 
emission of photons instead of decaying via Auger processes. This 
pushes the observed fluorescence yield from the 10 * level in the 
direction of unity. On the other hand, directing the majority of the 
emission towards the detector can substantially increase the signal, 
given that typical fluorescence detectors (and our spectrometer) havea 
geometrical acceptance of less than 10° * of the full solid angle. There- 
fore, stimulated emission can enhance the detected signal by several 
orders of magnitude. In our experiment we did observe traces of 
sample damage. Generally though, with optimized geometries at opti- 
mized X-ray sources, the signal gain can be used to minimize acquisi- 
tion times in the non-damaging regime, or alternatively to record full 
spectra in a single X-ray shot, so that the spectroscopic information is 
essentially generated before X-ray-induced changes set in. 

Such enhanced signal levels enable highly selective RIXS and XES 
studies with considerably improved energy resolution, so that new low- 
energy excitations may be discovered. Through the combination with 
pump-probe techniques, huge parameter spaces in the ultrafast time 
domain can quickly be mapped. Finally, through the combination of non- 
linear techniques from optical lasers with the high specificity of X-rays, 
small signals from extremely dilute active centres can now be dissected. 


METHODS SUMMARY 

Photon parameters. The experiments were conducted at the free-electron laser 
FLASH at Hamburg, Germany, operating at a photon energy of 115 eV. The spec- 
trally resolved data was recorded at a repetition rate of 5 Hz with 30-fs pulses!”””. 
We used the grating of beamline PG 2 as a mirror’’. The focus was round with a 
diameter of 45 jum, leading to fluences up to 1 J cm’ *. The angular dependence was 
measured at beamline BL 2 (ref. 29) at a burst repetition rate of 10 Hz, each inclu- 
ding 100 pulses at 250 kHz. The focal size was adjusted to be about 40 um to com- 
pensate for the now 50-fs-long pulses. With a gas attenuator, we set the fluence 
around 1Jcm *. The incoming photon numbers have been measured for each 
shot by a gas monitor detector”. 

Experimental setup. We scanned hydrogen-passivated silicon (100) surfaces through 
the beam. Spectra were recorded with a Scienta XES 355 spectrometer” anda single- 
photon-counting, multi-hit-capable detector centred at 92 eV with a resolution of 
0.4 eV. The scattering plane was the horizontal plane of polarization of the incom- 
ing photons with the spectrometer at around 85° to the beam (15° to the sample 
surface), while the pulses impinged at an angle of 80° to the surface. For the angu- 
lar dependence, we used an avalanche photodiode with an aluminium filter to block 
optical light. The incidence angle was 45°, whereas the detection angle was varied. 
Analysis. With our spectrometer, we recorded around 40,000 single shots. The data 
was sorted, binned and the standard deviations of the averaged values were com- 
puted for each bin. With the photodiode, we analysed 15,000 bursts. During free- 
electron laser irradiation, the emission from a plasma plume was optically visible 
at the sample and slight ablation was observed. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Photon parameters. The experiments have been conducted during two separate 
runs at the free-electron laser FLASH at Hamburg, Germany. The machine was ope- 
rated at a central photon energy of about 115 eV. In one run, we recorded the spectral 
resolved data. The machine was operated at a repetition rate of 5 Hz with typically 
30-fs pulses’”””. We used beamline PG 2 with the 1,200-lines-per-millimetre grat- 
ing in zero order, thus reflecting the incident beam without dispersion*. To 
maximize the photon flux, a grazing angle of 3.5° was used on the grating. The 
spot on the sample was round with a diameter of about 45 jim, determined through 
measurements on a fluorescence screen and permanent imprints studied under a 
microscope. These parameters lead to fluences up to 1Jcm *. The incoming 
photon numbers have been measured for each shot by a gas monitor detector”. 

In another experimental campaign, the angular dependence of the stimulated 

emission signal was studied. Our setup was placed at beamline BL 2 (ref. 29). After 
an upgrade of the FLASH accelerator, a burst repetition rate of 10 Hz became avai- 
lable, each burst including 100 pulses at 250 kHz. The pulse length was, at about 
50 fs, slightly longer and the spot size on the sample was adjusted to be about 40 um 
for compensation. Witha gas attenuator, we set the fluence at around 1 J cm’, the 
upper limit of the spectrally resolved data. 
Experimental setup. Hydrogen-passivated silicon (100) surfaces were scanned 
through the beam. The spectra have been recorded with a commercial Scienta XES 
355 spectrometer” using the 300-lines-per-millimetre grating. The counts have been 
detected by a multi-channel-plate, phosphor-screen combination that can count sin- 
gle photons and is multi-hit capable. The detection window was centred at 92 eV and 
the resolution was set to 0.4 eV. The scattering plane was the horizontal plane of pola- 
rization of the incoming photons. The spectrometer was mounted at around 85° to 
the beam (15° to the sample surface), while the pulses impinged at an angle of 80° to 
the surface. 

For the angular dependence, we used an avalanche photodiode with an alumi- 
nium filter to block optical light. The incidence angle on the sample was 45°, whereas 
the detection angle was varied. 

Data analysis. With our spectrometer, we recorded around 40,000 single shots. 
The data was sorted and binned and the standard deviations of the averaged values 
were computed for each bin. With the photodiode, we analysed 15,000 bursts. During 
free-electron laser irradiation, the emission from a plasma plume was optically 
visible at the sample and slight ablation was observed. 

The instantaneous core-hole density. As described in the main text, the instant- 
aneous core-hole density contains the main approximations in our theoretical 
treatment. Without stimulation, the core-hole density is assumed to be constant 
inside a cylinder formed by the exciting radiation, and outside it is assumed to be 
zero. The dimensions of this cylinder in the surface plane of the sample are given 
by the distribution of the incoming photons: the measured focal width and height 
(wand h). The depth of the cylinder is taken as the tabulated absorption length of 
the incoming radiation 4. By choosing 1/e dimensions for this volume, the integral 
number of core holes is the same as for the actual distributions and given through 
the number of incoming photons. We thus scale the core-hole density to yield the 
same integrals and the same second moments as the actual distributions. 

Wetreat the temporal distribution of the core holes as follows: Core-hole decays 
during the pulse length reduce the instantaneous core-hole density. The temporal 
distribution of the number of core holes in the sample at a given time f, that is, 
Naa (t), is the solution of the following differential equation: 


dNen(t) N, (2) Na(t) 

dt a Tt 
with the temporal distribution of the exciting photons N;,,(f) as the source term of 
this inhomogeneous differential equation and the core-hole lifetime t. For a tem- 
poral Gaussian pulse of incoming photons, the solution is the product of an expo- 
nential decaying function and an error function for the creation of core holes. To cast 
the analytical solution into the same approximation as above, where the core-hole 
density (without stimulation) is constant during a specific time period, we analyse 
N-(t) as follows. 

The integral of Nu,(f) per incoming photon, that is, the total number of core 
holes per photon present at each moment in time is proportional to t;. The longer 
the lifetime, the more core holes are present, because they have not yet decayed. 
The second central moment of N.(¢) for temporal Gaussian pulses is tj -+ Th with 
the pulse length t,. The width of this function is approximated by the square root 
of the second moment. The constant function that yields the same integral over 
time (t,), and is zero outside a window as large as the square root of the second 
moment of N4,(t), is thus given by: 


T 1 
T It , 
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This expression also accounts for the lowering of the number of core holes 
through spontaneous decay, because the decays are included in the core-hole 
lifetime. 

The instantaneous core-hole density without stimulation is taken to be con- 
stant; only stimulation is taken to annihilate core holes. The number of total 
stimulated photons is taken to be K times bigger than observed (accounting for 
angular dependences and the finite detector acceptance and efficiency). The 
instantaneous core-hole density p., for the differential equation is thus given by: 


Pech =p(Nin —KNetim,obs) 
_ AT 
P= mwhd 


The number of stimulating photons. The number of stimulating photons in the 
observation direction has been termed Ns. This value is given by the sum of the 
spontaneously emitted photons in the observation direction and the stimulated 
photons in the observation direction Notimobs- We introduce the acceptance and 
detection efficiency A of our spectrometer, weighted with the possible angular 
distributions of the spontaneous emission. 

The total number of photons that are spontaneously emitted in the observed 
direction increases across the interaction region, starting from zero at the far end of 
the observed volume to the number of incoming photons Nj, times the fluor- 
escence yield cg, and the acceptance (Aw ,N;,,). We approximate this increase with 
a constant across the interaction region. We take half the maximal value to yield 
the correct average value for a linear increase. This approximation ensures integr- 
ability and neglects the aspects around the temporal evolution, travel times of 
photons inside the interaction region and so on. 

Given that stimulated emission annihilates core holes (and K times more sti- 
mulated photons are emitted than are observed because of the finite acceptance of 
the detector), the number of spontaneously emitted photons is reduced with 
higher stimulation. We thus introduce the source term for stimulated emission as: 


Nobs = q(Nin =, KN6tim,obs) + Nstim,obs 


1 
q= zAwry 


The full source term should in principle also reflect the spatial growth of the 

seeding spontaneous emission across the interaction region, which would lead to a 
slightly different nonlinearity in the stimulated emission signal, especially for low 
photon numbers. In reality, this effect is more than compensated by the bigger 
effect of the temporal mismatch of the seeding spontaneous emission (for near- 
normal impinging irradiation, the seed appears at the same time across the whole 
interaction region in a time window of around Ti +1135 fs (see above), 
whereas the travel time of the stimulated field across the 50-um-long interaction 
region is already around 170 fs.) This can only be overcome by using a different 
geometry, as proposed in the main text. 
The full differential equation. In the differential equation for the growth of 
stimulated emission along the interaction region, we also treat the reabsorption 
of emitted photons explicitly, with reabsorption length L. With the above intro- 
duced abbreviations, we thus solve: 


AN tim,obs 
dx 


Notim,obs 


= (qNin + el = GK) Notim.obs )(PNin — pKNgtim.obs) Fstim — iz 


with Notimobs(%=0) = 0 

To get the total number of photons measured by the detector Nrotobs. We Must 
add the spontaneous emission, including the reduction through stimulated emis- 
sion and the reabsorption along the interaction length x, which is exp(-x/L): 


Ntot,obs = 2ge' 417) (Nin 1 KN¢tim,obs ) + Nstim,obs 


The solution and its limits. In the solution of the differential equation further 
combinations of parameters appear. X = x/L is the ratio between the actual inter- 
action length and the reabsorption length. This number is given by the experi- 
mental geometry and can thus be derived from tabulated values. ® = pLogtim, a 
dimensionless number that relates the excited volume p‘ to a stimulation volume 
L6 tim. The inverse value of @ is connected to the number of photons that is needed 
to observe the nonlinear increase in signal due to stimulated emission. 
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The full solution thus reads: 


Neot,obs = 2 *qNin + —2e~*qK) 


1 
KB(gK—1) 


{(1 + @Nin(2gK — 1) — \/1 + ON (N, +4gK — 2) 


For small incoming photon numbers, the signal is proportional to the number of 
incoming photons, with purely spontaneous emission (with reabsorption) as lin- 
ear limit: 


V1 + ®Nin(PNin + 4gK — 2)X 
2 


tanh + 


1 + ®Nin(2qK — 1) 
arctanh 
V14+ ®Nin(®Nin + 4qK — 2) 


Nin 0 a 
in =¥ —x 
Ntot,obs —> 2e qNin =e AwpyNin 


For very high numbers of incoming photons, the tanh in the formula nears unity 
and we obtain the following expression for the number of observed photons: 


This expression relates the saturation of stimulated emission to the geometry- 
dependent K factor that describes how much more stimulated emission happens 
in other directions than observed. Because stimulated emission is a highly non- 
linear effect, this factor can vary largely, depending on the shape of the interaction 
region and how the observation direction is oriented relatively. 

In our time-independent constant average discussion, the saturation of stimu- 
lated emission means a complete suppression of the Auger channel. In reality, 
Auger processes will still take place during the less intense beginning of the pulse, 
at the edges of the irradiated volume, as well as when the excited volume and the 
stimulated photons lose temporal overlap. Nevertheless, these contributions take 
out a constant fraction of the temporal and spatial evolution of stimulated emis- 
sion and are thus linearly dependent on N;,,. Therefore, the convergence to a linear 
dependence is not altered. 

Fitting the model to the experimental data. We fit the obtained solution of the 
differential equation for Njotobs to the observed dependence of the signal on the 
incoming photon numbers recorded with the spectrometer as shown in Fig. 2a and 
b. We determine the interaction length x from the experimental geometry: with the 
incoming beam at 80° to the surface, the observation direction at 15° to the surface 
and an absorption length of the exciting radiation of 45 nm (ref. 31), we estimate 


an interaction length of 171 nm and obtain X = 0.285 with reabsorption length 
L=600nm (ref. 31). The parameters q, K and @ are then fitted to the data. We 
obtain q = (2.610 1!)+ (0.310 11), K=(1.100 X 10'°) + (0.003 x 101°) 
and ® = (3.1 X 10 1!) + (0.3 x 1071). The fitting error is given as one standard 
deviation. 

Although we did not record many data points at very low intensities, the extra- 
polation of our data towards zero incoming photons can readily be done (Fig. 2b). 
Besides the reabsorption factor e~* this value directly yields the parameter q with a 
rather small error. The fitted value agrees well with our expectations and can be 
decomposed into the fluorescence yield* (wy = 3.8 X 10 *), the angular accept- 
ance of our spectrometer® of around 10° and the grating and detection effi- 
ciency” of 7 X 10°. 

The saturation limit 1/K of the signal is close to the value of q. Owing to the good 
data quality at higher photon numbers, we find a very small fitting error. The value 
of K signifies that many more stimulated photons are emitted in other directions 
than what we observe with our spectrometer. This finding is corroborated by our 
angle-dependent study and can be further optimized by choosing different geo- 
metries, as described in the main text. 

The fitting parameter @ is the main unknown in this study and accounts for the 
theoretical approximations as well as uncertainties in determining experimental 
parameters and further unconsidered nonlinear effects. We decompose @ into the 
factors P= 7.5 10'°m™ given through the experimental parameters (focal 
width and height 45 jim, absorption length*’ 45 nm, pulse length’’ 30 fs and the 
core-hole lifetime” of 19 fs), the reabsorption length*’ of 600 nm and the stimu- 
lation cross-section Ggtim- FOr Tstim» We thus obtain a value of 6.9X 10 7! m* which 
is about a factor of 25 bigger than the absorption cross-section. We have indica- 
tions from rough calculations that the overlap between valence band and core 
states is actually about a factor of 2-3 bigger than for conduction band and core 
states, which leads to a larger stimulation cross-section than absorption cross- 
section. The remaining order of magnitude can be explained through the non- 
linearity of the effect. This will nonlinearly enhance the signal from regions with 
higher excitation density, so that the effective focal size, excitation depth and dura- 
tion are actually shorter than estimated. Therefore, we conclude that the fitted 
parameters agree reasonably well with our expectations. 
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Stereoinversion of tertiary alcohols to tertiary-alkyl 


isonitriles and amines 


Sergey V. Pronin', Christopher A. Reiher' & Ryan A. Shenvi! 


The S22 reaction (bimolecular nucleophilic substitution) is a well- 
known chemical transformation that can be used to join two smal- 
ler molecules together into a larger molecule or to exchange one 
functional group for another. The S,2 reaction proceeds in a very 
predictable manner: substitution occurs with inversion of stereo- 
chemistry, resulting from the ‘backside attack’ of the electrophilic 
carbon by the nucleophile. A significant limitation of the S\2 reac- 
tion is its intolerance for tertiary carbon atoms: whereas primary 
and secondary alcohols are viable precursor substrates, tertiary 
alcohols and their derivatives usually either fail to react or produce 
stereochemical mixtures of products’. Here we report the stereo- 
chemical inversion of chiral tertiary alcohols with a nitrogen- 
ous nucleophile facilitated by a Lewis-acid-catalysed solvolysis. 
The method is chemoselective against secondary and primary alco- 
hols, thereby complementing the selectivity of the Sx2 reaction. 
Furthermore, this method for carbon-nitrogen bond formation 
mimics a putative biosynthetic step in the synthesis of marine 
terpenoids* and enables their preparation from the corresponding 
terrestrial terpenes. We expect that the general attributes of the 
methodology will allow chiral tertiary alcohols to be considered 
viable substrates for stereoinversion reactions. 

The incorporation of nitrogen during secondary metabolite biosyn- 
thesis occurs almost exclusively through the chemistry of carbonyls and 
imines: reductive amination, Mannich reactions and transamination®. 
The ability to mimic these biosynthetic pathways by chemical synthesis 
has vastly simplified the production of complex alkaloids. However, 
many alkaloids produced in marine environments do not adhere to the 
established biosynthetic patterns of terrestrial organisms (see Fig. 1a). For 
example, the nitrogen atoms in the terpenoid metabolites 1-6 derive 
from inorganic cyanide by means of reactions that are not well charac- 
terized but certainly do not involve carbonyl or imine substrates. These 
reactions may involve the attack of high energy, unstabilized carbocations 
by cyanide or a nucleophile derived from cyanide (Fig. 1b)*. Indeed, the 
primary products of these pathways are isonitriles®. 

Mimicry of this proposed biosynthetic transformation is challenging 
because, apart from substrate control’, intermolecular attack of unsta- 
bilized carbocations by cyanide or organonitriles is non-stereoselective. 
Furthermore, the high acidity of protons adjacent to carbocations faci- 
litates El (unimolecular) elimination with most counteranions*, which 
can erase adjacent stereogenic centres. Therefore, even if reprotonation 
of the nascent alkene is possible with Bronsted acid, regioisomers and 
further stereoisomers can result. 

Here we demonstrate a chemoselective and stereoselective method 
for incorporating cyanide into terpenoid frameworks by the stereoin- 
version of tertiary alcohols. This method is uniquely competent to 
generate in concise synthetic sequences the primary pharmacophores 
of isocyanoterpenes’, some of which have shown potent antimalarial 
activity’. The method is also generally useful for the conversion of 
alcohols to amines and is chemoselective for tertiary alcohol displacement 
in preference to secondary or primary alcohols. Previous approaches to 
the synthesis of terpenoids exemplified by Fig. 1 have relied on either 
abiotic strategies for formation of the C-N bond” or biomimetic reactions 


of nitriles with carbocations that rely on stereocontrol imparted by a 
cyclic scaffold’*. This latter strategy has proved to be capricious, because 
many terpenoid cations lead to isomeric mixtures or non-natural 
stereoisomers on capturing nitriles'*’°. Previous studies did, however, 
establish the idea of using a Lewis acidic reagent to ionize a tertiary 
leaving group in the presence of trimethylsilyl cyanide (TMSCN) as a 
competent nucleophile and isonitrile precursor. For instance, tertiary 
halides will react with TMSCN in the presence of TiCl, to produce 
isonitriles'®. Similar reactions have simplified the synthesis of a marine 
bisisonitrile; however, the late-stage isocyanation resulted in a near 
equimolar mixture of four stereoisomers'*. Subsequently the ionization 
of tertiary alcohols with superstoichiometric zinc halides in the pres- 
ence of TMSCN was developed’’. However, this latter method is neither 
stereoselective nor chemoselective and results in indiscriminate isocya- 
nation, elimination and isomerization of, for example, «-bisabolol 10 
(Fig. 2a). A stereocontrolled Mitsunobu isocyanation of tertiary alco- 
hols was also reported'*, but in our hands this reaction could not be 
successfully applied to the isocyanoterpenes’. 

We believed that a stereocontrolled displacement of alcohols or esters 
could be effected if TMSCN were used in excess and could therefore 
stereoselectively trap the nascent cation at the contact ion-pair stage’”®. 
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Figure 1 | Nitrogenous marine terpenoids derived from inorganic cyanide. 
a, A subfamily of marine terpenoids represented by 1-6 are decorated with 
nitrogen atoms in a different pattern from that observed in most terrestrially 
derived metabolites. b, From isotope-feeding studies the hypothetical biosynthetic 
path (7->8—9) common to 1-6 is known to incorporate inorganic cyanide and is 
proposed to involve unstabilized carbocations as biosynthetic intermediates. 
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Figure 2 | Development of a stereoselective Ritter-type reaction through 
tertiary alcohol inversion. a, Prior state-of-the-art isocyanation of tertiary 
alcohols applied to «-bisabolol 10 produces complex mixtures. GC, gas 
chromatography. b, Optimization studies identify Sc(OT£); as an optimal 


The challenge of this approach lies in the competitive equilibria of 
Lewis acid-catalysed solvolysis. First, the Lewis-basic TMSCN and the 
Lewis-basic leaving group compete for association with the requisite 
Lewis acid. In the extreme case of solvolysis, the Lewis acid might be 
coordinatively saturated with non-labile solvent and rendered unreac- 
tive. Second, the counteranion that is necessary for steric and/or elec- 
tronic shielding of one face of the carbocation can undergo equilibrium 
displacement with TMSCN”, which would result in a stereoisomeric 
mixture of solvolysis products. Unfortunately, whereas chiral tertiary 
chlorides” are viable substrates for solvolysis in alkaline methanol’, ter- 
tiary esters are known to undergo almost complete racemization in acid’. 

Indeed, attempted solvolysis in TMSCN with zinc and magnesium 
Lewis acids fails to consume any decaliny] trifluoroacetate 11 (Fig. 2b, 
entries 1 and 2; see also Supplementary Information for a complete list 
of Lewis acids screened). Bismuth triflate (entry 3) shows slow con- 
version of 11 to decalinyl isonitriles, but as an equimolar mixture of 
equatorial and axial stereoisomers. However, we found that early 
transition metal triflates (entries 4 and 5) yield a high ratio of diaster- 
eomers favouring stereoinversion (12). Scandium(II]) trifluorometha- 
nesulphonate (scandium(III) triflate; Sc(OTf)3) catalyses” solvolysis 
much more rapidly than Y(OTf)3, possibly as a result of its high rate 
of ligand exchange”; it therefore can be conveniently run at catalyst 
loadings as low as 3 mol%. The solvolysis was found to be ineffective 
with alcohols, and to be less stereoselective with acetyl and formy] esters, 
although the reaction rate for these latter substrates was much higher 
than for the trifluoroacetate. Given this trend, we expected selectivity to 
increase with greater fluorination of the leaving group. For 11, the 
results are roughly equivalent (entries 9 and 10), although linear sub- 
strates show improved selectivity when the leaving group is a longer- 
chain perfluoroalkanoate (see below). 

The optimal conditions for the solvolysis of decalin 11 were found to 
be generally applicable to stereogenic tertiary trifluoroacetates, and asa 
result 13 can be cleanly converted to isonitrile 14, which is the enan- 
tiomer of a naturally occurring marine terpenoid (Fig. 2c)’. 

The generality, functional group tolerance and utility of this solvo- 
lysis reaction are illustrated by Fig. 3. When a tertiary alcohol is the sole 
stereocentre in a molecule, the percentage inversion of 15, for instance, 
is 90%, accounting for enantiomeric purity of the alcohol. It should be 
noted that acyclic perfluorobutyrates showed slightly higher stereo- 
selectivity in solvolysis (at —5 °C with 10% (v/v) CH2Cl,) than the 
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Lewis acid 
—_ 


15 equiv. TMSCN, 


H =Nor 


ae 22°C 
11 
Proportion 

Entry (mol.%) Lewis acid 
1 10 ZnBro TFA 0 n/a 
2 5 Mg(OTf)2 TFA ) n/a 
3 5 Bi(OTf)3 TFA 14 49:51 
4 5 Y(OTf)3 TFA 70 84:16 
5 3 Sc(OT#)3 TFA 86 88:12 

6 ¢ 3 ScOTs | Ho oO 1 na 
7 3 Sc(OTf)3 Ac 75 76:24 
8 3 Sc(OT#)3 CHO 61 66:34 
9 3 Sc(OT#)3 C(O)CFs5 69 85:15 
10 3 Sc(OTf)3 C(O)C3F7 79 87:13 


Lewis acid and fluoroesters as optimal substrates for stereoinverting 
isocyanation. Eq, equatorial; ax, axial; d.r., diastereomeric ratio; n/a, not 
applicable. c, Application of this method to bisabolyl trifluoroacetate 

13 provides isonitrile 14 cleanly and with high stereoselectivity. 


corresponding trifluoroacetates: 14 (diastereomeric ratio 85:15) and 15 
(enantiomeric ratio 87:13, corresponding to 93% inversion). Proximal 
unsaturation (for example 16) has little or no effect on stereoselectivity. 
The diastereomeric ratio of isonitriles 17 is almost identical to the 
percentage inversion of 15, and so the distal stereocentre seems to have 
essentially no influence but serves as a useful stereochemical probe. 
Lewis basic groups such as esters (16) and nitriles (17) are tolerated but 
tend to decrease reaction rates; so catalyst loading was therefore 
increased to 6mol%. Not surprisingly, alkynes 20 are tolerated, as 
are primary alcohols 21, which are also subjected to solvolysis condi- 
tions as their corresponding trifluoroacetates but do not ionize. 

Naturally occurring marine terpenes are easily accessible with this 
method (also see Fig. 4). For instance, isocyanocadinene’* 22 is synthe- 
sized in two steps from its terrestrial counterpart, cedrelanol (seven 
steps from commercial material), whereas 22 was previously prepared 
in 29 steps. Additionally, trans-androsterone can be elaborated to the 
corresponding tris-trifluoroacetate (see Supplementary Information), 
which on solvolysis selectively produces isonitrile 23 with excellent 
stereoselectivity and chemoselectivity. This stereoinversion reaction 
shows the same stereochemical preference as an Sy2 reaction, but with 
reverse substitution demands. Tocopherol-derived quinone 24 can be 
accessed with high diastereoselectivity, obviously independently of its 
two other stereocentres. The monoisonitrile 25 and bisisonitrile 26 can 
be derived from dihydrosclareol by means of one and two stereoinver- 
sions, in which selectivity presumably reflects the different heterolytic 
stabilities of the two trifluoroacetates. 

This procedure enables the stereoinversion of tertiary alcohols to the 
corresponding tertiary-alkyl amines and their derivatives (Fig. 4). For 
instance, 1.11 g of a-bisabolol (10) can be trifluoroacetylated and sol- 
volysed to isonitrile 14, which in turn can be hydrolysed to 1.10 g of 
amine 2, which was isolated as a 5:1 ratio of diastereomers. The enan- 
tiomer of 27 was previously synthesized in nine steps (19 mg)”°. Simple 
hydrolysis also converts androsterone-derived isonitrile 23 into amine 
30, and quinone 24 into aza-tocopherol 31 (after reduction of the 
intermediate iminoquinone). Naturally occurring cadinenes 28 and 
29 (ref.4) are simple to access from 22 by means of hydrolysis or 
sulphurization (see Supplementary Information). 

This isocyanation is of course not without limitation (Fig. 5). Highly 
branched tertiary alcohols (31-33) generally give poor diastereoselec- 
tivity, probably because of an encumbered approach of the nucleophile. 
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Me, NC Me Me, NC Me 
et PP ie et 7~ ve 


15: 78%, 90% inversion* 
(84:16 from 7:93 e.r.)* 


16: 79%, 88% inversion* 
(83:17 from 7:93 e.r.)* 


Me NC Me NC 
: oe Et ere 
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21: 71%, 90:10 drt 22: 59%, 91:9 dr. 


23: 69%, 91:9 d.r.t 


OH 


Et 


25: 44%, 93:7 dirt 


26: 34%, 88:12 d.r. 


26 (X-ray) 
Figure 3 | Probing the selectivity and functional group tolerance of 
isocyanation. Py, pyridine; e.r., enantiomeric ratio; d.r., diastereomeric ratio. 


*Determined by Mosher analysis. tAfter treatment with methanol and 
triethylamine. 


Furthermore, we discovered early on that cyclohexanols are problem- 
atic substrates. Whereas equatorial trifluoroacetate 34 is solvolysed to 
the axial isonitrile 36 with high stereoselectivity, the displacement of 
axial trifluoroacetate 35 is slower and poorly diastereoselective, and 
actually favours stereoretention. We suspect that this low stereoselect- 
ivity is due to the imperfection of the 4-t-butyl conformational ‘lock’® 
that allows enough torsional mobility for ionization at the 1 position to 
proceed preferentially though a twist-boat conformer’’. The exact rela- 
tionship between torsional freedom and stereoselection is not clear, but 
nevertheless the more rigid decalins 37 and 11 do not suffer from the 
same promiscuity. It remains unclear whether C-N bond formation 
occurs at the contact ion pairs of equatorial esters 34 and 37, because 
solvent-separated cations of 34 and 37 yield the same axial isonitrile 
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Me Me Me Me 
27 (72% from 22) 28 (59% from 22) 


29 (97% from 28) 


30 (87% from 24) 


Figure 4 | Amines, amides and isothiocyanates synthesized from the 
corresponding isonitriles. Chemoselective and stereoselective isocyanation 
allows access to a variety of marine metabolites and allows terrestrial terpenoids 
to be converted to their theoretical marine counterparts. 


products’. Nevertheless, we are optimistic that these solvolysis condi- 
tions may help to shed light on the still obscure stereoelectronic basis for 
stereoselective addition to cyclohexyl cations”. 

A detailed mechanistic picture of the solvolysis (for example, 
13-14) is not yet available, but a general sketch (Fig. 6) and relevant 
observations are worth discussion. Because Sc(OTf); is a strong Lewis 
acid, we suspect that it becomes coordinatively saturated (A) on addi- 
tion to TMSCN, and we observe that it dissolves completely. By ana- 
logy with the high rate constant for inner-sphere water ligand exchange 
of Sc(OT£); observed previously”’, we suspect that A rapidly exchanges 
a TMSCN ligand for ester B to provide activated substrate C. Two 
reactions are then available to C: either Sy2 or Sy1 displacement (path 


31 32 


58%, 70:30 d.r. 78%*, 63:37 dor. 


He 98:2 dir. 
pepo Stereo- 
t-Bu inversion NC 
34 Sc(OTf)3, 
TMSCN Me 
OTFA Stereo- t-Bu 
retention 
eAu Me 67:33 der. 36 
35 
Me NC 
Sc(OTf)3, TMSCN 
OTFA Sel Me 
92:8 d.r. 
(stereoinversion) 
37 12-ax. 
OTFA Me 
Sc(OTf)3, TMSCN 
Me et NC 
88:12 d.r. 
(stereoinversion) 
11 12-eq. 


Figure 5 | Limitations of tertiary alcohol stereoinversion. Highly branched 
substrates provide lower diastereomeric ratios of isonitriles, as do 
conformationally flexible cyclohexanol derivatives. OTFA, trifluoroacetate. 
*Isolated along with regioisomeric isonitriles. 
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1 or path 2). It is possible that departure of the ester and attack by 
TMSCN occur concurrently (path 1), in which the nucleophile stabi- 
lizes the developing positive charge by means of nascent bond forma- 
tion (D), but this occurrence is rare’ and has been ruled out in related 
solvolysis reactions’’. An alternative possibility is that ionization of C 
occurs (path 2) to provide contact ion pair E, and TMSCN approaches 
the planar carbocation opposite the large and electron-rich counter- 
anion to provide stereochemically inverted isonitrile 8 after silyl transfer 
to the trifluoroacetate. If instead of attack E undergoes ion dissociation 
to solvated cation F (path 3), then attack of F from either face would 
provide the stereochemically scrambled isonitrile rac-8. Thus, the 
stereochemical ratio of product isonitriles should depend on the relative 
rates of attack (E—8) versus anion-solvent exchange (EF), and two 
observations are worth noting. First, the identity of the ester (R = H, 
CH3, CF3, C3F7) affects the stereochemical outcome, but the contri- 
bution of these esters to the relative rates of attack versus anion-solvent 
exchange is unclear. Transient interactions between the fluorines on the 
perfluoroalkanoates and the carbocation”, or differential distribution 
of charge between the alkanoate and metal centre’’, could explain this 
trend. The overall reaction rates of acetates are higher than those of 
trifluoroacetates, but the inverse relationship of reaction rate and stereo- 
chemical outcome may be coincidental. Second, the percentage inver- 
sion of trifluoroacetyl ester 13 to isonitrile 14 was found to be inversely 
proportional to temperature: 79:21 at 22°C, compared with 83:17 at 
3 °C. Given these trends, further investigation is necessary to provide a 
clear mechanistic picture; however, three control experiments were also 
performed in support of Fig. 6. Replacement of Sc(OTf)3 with trimethyl- 
silyl triflate (TMSOT£) under otherwise identical conditions did not 


TMSCN 
Sc(OTH)3 > Sc(TMSCN),(OTA)3 
A 
O,CR 
aur? - TMSCN 
_ St R? 
OSc(TMSCN),(OTA)3 a 
O7 *R @ -SEATMSCN)OTD 5 | 
eR Sy2 q 
: 1 ~R2 : A 


a 


promote any reaction of trifluoroacetyl esters, which excludes electro- 
philic silicon as the active Lewis acid. Addition of 2,6-di-t-butylpyridine 
(3:1 with catalyst) did not inhibit product formation, which excludes 
Bronsted acid catalysis, although increased amounts of elimination were 
observed. Finally, addition of stoichiometric trimethylsilyl trifluoroace- 
tate (TMSO,CCF;) did not affect the rate or stereoselectivity of the 
reaction, which indicates no participation of the product acetate in 
stereochemical scrambling. We are working to expand this methodology 
to include other nucleophiles, but preliminary data suggest that the 
current reaction cannot be directly applied to different reactants, and 
in fact is surprising in its uniqueness. For instance, in acetonitrile or 
trimethylsilylacetonitrile, the perfluoroalkanoates do not react at all, 
whereas trimethylsilylazide and trimethylsilylisothiocyanate solvents 
produce complex mixtures. 

Here we have developed a method for the stereoselective conversion 
of tertiary alcohols to tertiary-alkyl amines, which fills a major meth- 
odological gap in organic chemistry. The reaction entails a Lewis-acid- 
catalysed solvolysis of tertiary alcohol derivatives, a conceptual advance 
that may enable the expansion of this chemistry to related systems. The 
reaction proceeds with inversion of configuration by means of an iso- 
nitrile and works best on minimally branched linear tertiary alcohols, 
and conformationally inflexible alicyclic alcohols. The reaction is che- 
moselective for tertiary-trifluoroacetyl esters in preference to secondary 
or primary ones, which do not react even on prolonged exposure to the 
solvolysis conditions. This reactivity complements standard stereoin- 
version (Sy2) reactions, which favour secondary and primary substrates, 
whereas most tertiary substrates fail. In addition, this method converts 
readily available terrestrial terpenoid alcohols into their marine counter- 
parts, which should greatly facilitate the preparation of these fascinating 
and potentially useful molecules. It is possible that the same overall 
transformation is used in biosynthetic pathways. This reaction adds a 
new retrosynthetic manoeuvre to the cache of transforms available to 
organic chemistry. We expect that the general attributes of this reaction 
will lead to the development of other stereoinversion reactions of tertiary 
alcohols and stimulate further advances in carbocation chemistry. 


METHODS SUMMARY 


A solution of trifluoroacetate 13 (32.0 mg, 0.1 mmol) in TMSCN (0.1 ml) was cooled 
to 0°C and treated with a solution of anhydrous Sc(OTf); (1.5 mg, 0.003 mmol) in 
TMSCN (0.1 ml). The reaction mixture was left for 18 h at 3 °C and quenched with 
tetramethylethylenediamine (7.5 11, 0.05 mmol). The resulting solution was concen- 
trated under reduced pressure, and the residue was purified by flash column chro- 
matography on silica gel (elution with 35% dichloromethane in hexanes) to deliver 
18.0 mg (78% yield) of 7-isocyano-7,8-dihydro-«-bisabolene (14). See the notes in 
Supplementary Information regarding safety. 
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Changes in North Atlantic nitrogen fixation 
controlled by ocean circulation 


Marietta Straub', Daniel M. Sigman”, Haojia Ren’, Alfredo Martinez-Garcia', A. Nele Meckler', Mathis P. Hain? & Gerald H. Haug" 


In the ocean, the chemical forms of nitrogen that are readily avail- 
able for biological use (known collectively as ‘fixed’ nitrogen) fuel 
the global phytoplankton productivity that exports carbon to the 
deep ocean’ *. Accordingly, variation in the oceanic fixed nitrogen 
reservoir has been proposed as a cause of glacial-interglacial changes 
in atmospheric carbon dioxide concentration’. Marine nitrogen fixa- 
tion, which produces most of the ocean’s fixed nitrogen, is thought to 
be affected by multiple factors, including ocean temperature* and the 
availability of iron*** and phosphorus®. Here we reconstruct changes 
in North Atlantic nitrogen fixation over the past 160,000 years from 
the shell-bound nitrogen isotope ratio (‘*N/'4N) of planktonic fora- 
minifera in Caribbean Sea sediments. The observed changes cannot 
be explained by reconstructed changes in temperature, the supply of 
(iron-bearing) dust or water column denitrification. We identify a 
strong, roughly 23,000-year cycle in nitrogen fixation and suggest 
that it is a response to orbitally driven changes in equatorial Atlantic 
upwelling’, which imports ‘excess’ phosphorus (phosphorus in stoi- 
chiometric excess of fixed nitrogen) into the tropical North Atlantic 
surface**®. In addition, we find that nitrogen fixation was reduced 
during glacial stages 6 and 4, when North Atlantic Deep Water had 
shoaled to become glacial North Atlantic intermediate water*, which 
isolated the Atlantic thermocline from excess phosphorus-rich mid- 
depth waters that today enter from the Southern Ocean. Although 
modern studies have yielded diverse views of the controls on nitro- 
gen fixation’***, our palaeobiogeochemical data suggest that excess 
phosphorus is the master variable in the North Atlantic Ocean and 
indicate that the variations in its supply over the most recent glacial 
cycle were dominated by the response of regional ocean circulation 
to the orbital cycles. 


Nitrogen fixation, the conversion of Nj to ammonia, by cyanobac- 
teria in surface waters seems to dominate the input of fixed N to the 
ocean. The main loss of fixed N is biological reduction to N> (general- 
ized here as “denitrification’) in marine sediments and in the suboxic 
zones of the water column in the eastern tropical Pacific Ocean and the 
Arabian Sea'. Nitrogen fixation introduces N with 8'°N of about — 1%o 
(8°°N = (°N/ NS) ciel (7°N/"4N),eference — 1, where the reference is 
atmospheric N2), whereas water column denitrification preferentially 
removes '“N-bearing nitrate’ (NO; _ ). In total, global ocean denitrification 
raises the 5'°N of mean ocean nitrate above that of newly fixed N (ref. 10). 

Sediment records show N isotopic evidence of reduced water column 
denitrification during the Last Glacial Maximum (LGM) and other cold 
phases of the most recent glacial cycle, relative to the current interglacial 
(the Holocene epoch) and times of intermediate climate states’! (for 
example Marine Isotope Stage (MIS) 3, from 57 to 29 kyr ago). Benthic 
denitrification, because of its weak effect on the 5'°N value of nitrate in 
the water column””, has not been directly reconstructed, but it may also 
have decreased during ice ages because of reduced continental shelf area 
due to lower sea level’. 

The history of N fixation has been vigorously debated**"*. As with 
water column denitrification, N fixation is apparent in the spatial 
pattern of nitrate 5'°N. Against the background of the mean ocean nitrate 
5°°N of ~5%o, N fixation in the euphotic zone followed by the sinking 
and remineralization of organic matter adds low-5'°N nitrate to the 
thermocline, which is observed across the western subtropical and tropi- 
cal North Atlantic’. This low 8'°N value is recorded by the organic N 
bound within the shell walls of planktonic foraminifera**"®. Foraminifera- 
bound 8'°N (FB-8!°N) from the Caribbean Sea and the Gulf of Mexico 
indicates that N fixation was reduced during the LGM™*””. 
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Figure 1 | Core locations, surface winds, excess P at 20-m depth and main 
surface currents. The star indicates ODP Site 999 Hole A (12° 45’ N, 

78° 44' W; 2,828 m), the core in which EB-5°N was measured to reconstruct 
North Atlantic N fixation. The filled red circle indicates ODP Site 658 

(20° 45’ N, 18° 35’ W; 2,263 m), from which sediment Zr/A] data are reported 
in relation to North African aeolian flux. The open blue circle indicates core 
RC24-7 (1° 21'S, 11°55’ W; 3,899 m), previously published data from which 


reveal precession cycles in equatorial Atlantic upwelling’. The open red circles 
indicate ODP Site 659 (18° 05’ N, 21° 02’ W; 3,070 m) and ODP Site 663 

(1° 11.9’ S, 11° 52.7’ W; 3,708 m), from which come previously published data 
pertaining to the aeolian flux’’”’. Excess P ([PO,° ] — [NO3_ ]/16; ref. 6) is an 
annual average. Black arrows show June-August winds, with length indicating 
speed. NBC, north Brazil current; NEC, north equatorial current; SEC, south 
equatorial current. 
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Here we build on the FB-5°N results of ref. 15 with new data from the 
same core, Ocean Drilling Program (ODP) Site 999 Hole A in the Cari- 
bbean Sea (Fig. 1), extending the record back over the past 160 kyr 
through a full glacial cycle (Fig. 2a, b). We measured FB-5'°N in two 
euphotic-zone-dwelling species of planktonic foraminifera, Globigerinoides 
ruber and Globigerinoides sacculifer. The records from the two species are 
very similar (Fig. 2a and Supplementary Fig. 1) and highly coherent 
(Supplementary Fig. 6). The FB-5'°N of G. ruber (Fig, 2a, green symbols) 
is typically slightly (~0.3%o) lower than that of G. sacculifer (Fig. 2a, blue 
symbols). There is a greater (=0.6%o) interspecies difference during the 
LGM, in glacial MIS 4 and during the transition from MIS6 to MIS5 
(Fig. 2a and Supplementary Fig. 1). Although changes in the interspecies 
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Figure 2 | FB-6'°N in ODP Site 999 and its relationship to SST, dust flux 
and water column denitrification. a, FB-5!°N was measured separately in two 
species, G. sacculifer (blue) and G. ruber (green). The black symbols show the 
two-species average FB-8'°N at each sampling depth. As plotted, increasing N 
fixation is upward. b, Globigerinoides ruber 8180 at ODP Site 999”. In a and 
b, precession is dashed orange. c, The 23-kyr precession-band-filtered FB-5°N 
(black) and G. ruber 5'°O (grey) illustrate the 4-7-kyr phase lag between 5'°N 
and 8/80 at the precession band (Supplementary Figs 6 and 9). Given an 
average phase lag of 4-7 kyr between 5'°O and precession”, this observation 
confirms the correspondence of 5'°N minima with precession maxima (June 
aphelion). (8'80 = (8O/ *©O) ample (180/'°O),eference — 1, where the reference 
is Vienna PeeDee Belemnite.) d, Globigerinoides ruber Mg/Ca-derived SST at 
ODP Site 999°, e, Atlantic dust-related records are mass accumulation rate of 
iron (MAR Fe) at ODP Site 999”, Zr/Al at ODP Site 658 and terrigenous 
material abundance at ODP Site 663”. f, g, The Pacific bulk sediment 3>N 
records are from the Mexican margin”? (g) and the equatorial Pacific” (f), with 
increasing denitrification plotted upward (see also Supplementary Fig. 5). 
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5”°N difference probably hold valuable information’’, given the overarch- 
ing similarity of the G. ruber and G. sacculifer FB-5'°N records, we focus 
hereafter on the FB-5'°N record generated by averaging the two species- 
specific records (Fig. 2a, black symbols). 

The FB-8'°N of the previous interglacial (early MIS 5) is similar to 
that of the Holocene, and the FB-5!°N of the penultimate glacial maxi- 
mum (MIS 6) is similar to that of the LGM. There is no evidence from 
previous work for mean ocean nitrate 5'°N changes across the most 
recent glacial cycle that mimic the FB-8'°N record of Site 999, with 
the exception of a possible 5'°N maximum during the most recent 
deglaciation'*'*. Furthermore, South China Sea FB-5'°N data support 
the interpretation of the Caribbean LGM-to-Holocene FB-5'°N 
decrease as recording a regional change in N fixation'*. The FB-5'°N 
change through the Caribbean Sea record is of similar amplitude to the 
regional isotopic imprint of N fixation, in which nitrate 5'°N decreases 
from ~5.3%o at ~1,200-m depth in the water column to ~2.5%bo at 
~200 m (ref. 14). Thus, the FB-8'°N record indicates proportionally 
large variations in N fixation’®. The record of bulk sediment 8'°N at 
Site 999 shows a number of minor changes that coincide with changes 
in FB-8>N, but they are so weak as to be unnoticeable without the 
benefit of the larger FB-5'°N changes (Supplementary Fig. 2). The 
apparent muting of the variations suggests an allochthonous N input 
to the bulk sediment that does not respond to open-ocean changes’’. 

The FB-8'°N variation is highly correlated with Earth’s 19-23-kyr 
orbital precession cycle (Fig. 2a and Supplementary Figs 6, 8 and 10). FB- 
5'°N minima consistently lag G. ruber calcite '*O minima by 4-7 kyr 
(Fig. 2c and Supplementary Figs 6 and 9), and they thus lag peak 
Northern Hemisphere summer insolation at the precession frequency 
band by 10-12kyr (ref. 19). That is, minima in Caribbean FB-8'°N 
(maxima in N fixation) occur during aphelion in the northern summer 
(precession maxima; Fig. 2a), which is the opposite of the precession 
phase that encourages Northern Hemisphere summer warming and 
deglaciation. Despite this precessional phasing, FB-8'°N is higher (N 
fixation is reduced) during the major glacial periods (MIS6 and 
MISs 4-2) relative to the interglacials (MISs 5 and 1). This implies that 
there are non-precessional step changes in FB-5'°N at each of the 
major deglaciations (the transitions between MISs 6 and 5 and between 
MISs 2 and 1) and at the intensification of glaciation at the transition 
between MISs 5 and 4. 

Temperature is recognized as a control on the distribution of N 
fixation, both in the ocean and on land*. Thus, ice-age cooling might 
be expected to reduce N fixation, consistent with the basic LGM- 
Holocene change in N fixation. However, the full record indicates that 
N fixation does not track Caribbean sea surface temperature (SST) as 
reconstructed from the Mg/Ca ratio of G. ruber at Site 999°° (Fig. 2d 
and Supplementary Fig. 9c). SST rose early in the penultimate degla- 
ciation (the transition between MISs 6 and 5) and had already begun to 
decline before N fixation reached the first of its three maxima in MIS 5, 
with the subsequent maxima in N fixation occurring against a rela- 
tively stable baseline of low SST (Fig. 2d and Supplementary Fig. 7a, b). 
Hence, SST change does not seem to explain the history of N fixation 
reconstructed from FB-5'°N. 

Iron is a critical nutrient for N fixers, and iron-bearing dust has 
been suggested to cause ice-age enhancements in N fixation*”. In our 
comparison of the N fixation record at Site 999 with records of dust 
input, we include a new reconstruction of dust transport from ODP 
Site 658, a coring site off the coast of North Africa, which is the domi- 
nant source of dust to the Caribbean (Fig. 1). The Zr/Al ratio increases 
with high-energy aeolian transport (Supplementary Fig. 3). Focusing 
first on the large-scale glacial-interglacial changes, the FB-5'°N record 
at Site 999 indicates that N fixation was, if anything, inversely corre- 
lated with Zr/Al at Site 658; reconstructed dust flux at Site 6597!, which 
is farther offshore; terrigenous material abundance at equatorial Atlantic 
Site 663”; and iron accumulation rate at Site 999”* (Fig. 2e and Sup- 
plementary Fig. 4). The correlation of N fixation with dust over the 
precession cycle is more difficult to assess. North African aridity has 
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Figure 3 | Comparison of FB-5'°N with changes in precession-paced 
equatorial Atlantic upwelling and glacial-interglacial Atlantic intermediate 
water source. a, Florisphaera profunda abundance (light blue) and 
foraminifera-species-derived SST (dark blue) from equatorial Atlantic core 
RC24-7’. b, FB-8!°N from ODP Site 999. c, Bottom water carbonate ion 
reconstruction from Caribbean Sea core VM28-122*! (12° N, 79° W; 3,620 m). 
The orange dashed line in a is the precession parameter. Episodes of higher 
equatorial Atlantic upwelling led to higher N fixation in the North Atlantic, 
observed as lower FB-5'°N. In c, the switches in North Atlantic intermediate 
water source between AAIW, the dominant source of excess P to the Atlantic’, 
and GNAIW, which was low in [PO,°  ] (refs 28,32) and presumably also in 
excess P, can be seen. Periods of inflow of AAIW stimulated N fixation. 


been reported to increase during northern summer aphelion™, when 
we observe higher N fixation. However, available records suggest, on 
balance, that dust input is out of phase with the precessional cycles in N 
fixation (Fig. 2 and Supplementary Information). All things consi- 
dered, it does not seem that a dust-derived iron supply can explain 
the reconstructed N fixation changes. 

North Atlantic N fixation might be expected to change so as to 
balance denitrification changes'’. Bulk sediment 5'°N records near 
the major suboxic zones indicate a lower rate of water column denit- 
rification during cold periods’’, in the same sense as the N fixation 
decrease that we reconstruct for the LGM, the glacial of MIS 4 and the 
glacial maximum of MIS6 (Fig. 2f, g and Supplementary Fig. 5). 
However, beyond this coarsest scale, there are major differences. For 
example, in contrast to the sediment 8N records from the denitrifica- 
tion zones, FB-8'°N at Site 999 during early MIS 3 is similar to that 
during the LGM, with these local 8'°N minima separated by the high- 
est sustained 8'°N of the record at ~35 kyr ago in late MIS 3 (Fig. 2a). 
In addition, tropical Pacific bulk sediment 3N maxima (water col- 
umn denitrification maxima) seem to be associated with minima in 
Earth’s orbital precession cycle (Fig. 2f), when perihelion coincides 
with northern summer. The FB-8'°N record indicates that these times 
represent minima, not maxima, in Atlantic N fixation. Although a 
global ocean reconstruction of denitrification does not yet exist, the 
available records do not provide clear signs that changes in water 
column denitrification are responsible for the timing of changes in 
North Atlantic N fixation over the most recent glacial cycle. 

The importance of precession in Caribbean FB-5'°N suggests a con- 
nection to low-latitude climate. Over recent glacial cycles, equatorial 
Atlantic upwelling varied with a precession frequency’. In the eastern 
equatorial Atlantic, climatic precession maxima (aphelion in northern 
summer) coincide with minima in reconstructed SST (Fig. 3a). In addi- 
tion, these times are characterized by minima in the fraction of the 
nanofossils represented by the nutricline dwelling form Florisphaera 
profunda (Fig. 3a), reflecting a shoaling of the nutricline’. These obser- 
vations indicate that precession maxima yielded stronger equatorial 
Atlantic upwelling’. 
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Precession-paced change in the equatorial Atlantic upwelling, coupled 
with a tendency of N fixation to occur in N-depleted, phosphorus-bearing 
water’, explains the precession signal in Caribbean FB-5'°N. Compared 
with the thermocline and surface layer of the western North Atlantic, 
subsurface water upwelled along the Equator has higher ‘excess P’** 
(defined as the phosphate concentration minus 1/16th of the nitrate 
concentration; Fig. 1). In waters with positive excess P, phytoplankton 
tends to deplete the water of nitrate before the complete consumption of 
phosphate. The residual phosphate in this nitrate-free water should 
encourage the growth of N-fixing phytoplankton”’. In line with this 
expectation, data-assimilating model calculations suggest that the 
transport into the tropical North Atlantic of equatorial water bearing 
excess P leads to a band of high N fixation in this region®”*. Moreover, N 
fixation seems to respond in the equatorial Atlantic upwelling region 
itself, before lateral transport of surface waters**. The excess P of the 
waters upwelled along the Equator originates in sub-Antarctic mode 
water (SAMW) and Antarctic intermediate water? (AAIW). With cli- 
matic precession maxima, increased equatorial upwelling of excess-P- 
bearing water coupled with northwestward transport of equatorial sur- 
face waters into the western tropical North Atlantic by the north Brazil 
current would increase the supply of excess P to the latter region, 
intensifying N fixation there (Fig. 3). Although the Amazon outflow 
may also contribute to the excess P in the north Brazil current (Fig. 1), 
its response to precession’ seems to be inconsistent with our recon- 
structed N fixation changes. 

The baseline shifts in FB-5'°N at the respective transitions between 
MISs 6 and 5, 5 and 4, and 2 and 1 parallel well-known changes in the 
depth of North Atlantic ventilation (Fig. 3c): glacial North Atlantic 
intermediate water (GNAIW) formed during glacial stages 6 and 4-2, 
rather than the North Atlantic Deep Water (NADW) of interglacial 
stages®. GNAIW formation reduced the nutrient concentration at the 
base of the Atlantic thermocline” by preventing the influx of AAIW and 
SAMW or by diluting their impact, or both (Fig. 4). The low nutri- 
ent content of GNAIW implies that it formed from nutrient-depleted, 
low-latitude North Atlantic surface water, which today lacks excess P 
(ref. 5). Thus, the switch from NADW to GNAIW lowered the excess P 
of the subsurface waters to be upwelled (Fig. 4). A plausible alternative 
for the step changes in FB-5'°N is that Atlantic N fixation was respond- 
ing to sea-level-driven reductions in sedimentary denitrification during 
glacial stages 6 and 4-2. However, a shelf-associated decrease in sedi- 
mentary denitrification would have lowered excess P in the low-latitude 
upper ocean within decades to centuries of the sea-level decline at the 
onset of MIS 4. The FB-5'°N increase in glacial stage 4 occurred ~7 kyr 
after the sea-level decrease (Fig. 2a, b and Supplementary Fig. 9a), incon- 
sistent with a response to regional sedimentary denitrification but con- 
sistent with the timing of the switch to GNAIW*. 

Although variations in Atlantic N fixation do not seem to have 
paralleled global water column denitrification over the past 160 kyr, 
this does not require an imbalance between N fixation and denitrifica- 
tion on a global ocean basis; indeed, a sensitivity to phosphorus would 
encourage a global balance’. At the same time, this study recalibrates 
our expectations regarding the regional drivers of biogeochemical 
change in the low-nutrient tropical and subtropical ocean. Water col- 
umn denitrification is broadly recognized as being sensitive to the 
physical circulation, given that the major ocean suboxic regions are 
located in the tropical ‘shadow zones’, the poorly ventilated subsurface 
volumes east of the subtropical gyres. Our results indicate that ocean 
circulation, through its control of nutrient transport, can also drive 
changes in regional rates of N fixation’. 

Although iron limitation of phytoplankton is fundamental to ocean 
biogeochemistry~*”, the data reported here suggest that, in the North 
Atlantic, the rate of N fixation over the past 160 kyr has not been tightly 
constrained by iron supply. The entire ocean N and P reservoirs must flow 
through the North Atlantic on the timescale of ocean ventilation (~1 kyr), 
which is probably shorter than the residence time of the ocean’s fixed N 
reservoir’. Thus, if N fixation in the rest of the ocean is significantly limited 
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Figure 4 | Effect of changes in glacial-interglacial circulation on N fixation 
in the Atlantic. a, During the Holocene and MIS 5, SAMW and AAIW import 
high excess P (red) to the low latitudes, where its upwelling stimulates N 
fixation, which removes the excess P (conversion from red to blue). North 
Atlantic surface water, SAMW and AAIW are incorporated into newly formed 
NADW, which is thus intermediate in excess P (purple; ref. 5). b, In glacial 
stages MIS 6 and MISs 4-2 (where MIS 2 is the LGM), the low excess P of 
southward-flowing GNAIW would have diluted or replaced the high excess P 
of SAMW and AAIW, reducing the concentration of excess P available to be 
upwelled, and thus lowering N fixation. The precession-paced changes in 
equatorial Atlantic upwelling are not indicated in this diagram. 


by iron, then the North Atlantic may have an important role in main- 
taining the relationship between the global ocean N and P reservoirs. 


METHODS SUMMARY 


FB-85'°N was analysed by oxidizing organic N to nitrate and then using a bacterial 
method to convert the nitrate to nitrous oxide for isotope ratio mass spectrometry’”. 
The individual species were picked manually, to obtain 3-7 mg of foraminifera per 
sample. The picked sample was crushed and underwent reductive and then oxid- 
ative cleaning to remove all N not bound within (and thus physically protected by) 
the calcite walls of the foraminifera tests. Dissolution of the tests with HCl released 
the test-bound organic N, which was then oxidized to nitrate. Sediment Zr/Al was 
measured with an Avaatech XRF Core Scanner at MARUM, University of Bremen. 
Data are of X-ray fluorescence count ratios and were obtained ata resolution of 1 cm 
over an area of 1.2 cm’ directly at the split-core surface of the archive half. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Sample preparation for FB-5'°N. The protocol follows that of ref. 15. The indi- 
vidual species are picked manually under a dissecting microscope (315-425-lm 
size fraction). Foraminifera (3-7 mg, or 600-800 specimens, per sample) are used 
to carry out the analysis and are gently crushed. Clay particles are removed using a 
2% polyphosphate solution and 5-min sonication in an ultrasonic bath. The sam- 
ples are then rinsed with deionized water. Dithionite-citric acid (10 ml) is added to 
each sample, and the solution is kept for 1 h in a water bath at 80 °C to remove any 
metal coatings. After being rinsed with deionized water, the sample undergoes 
oxidative cleaning with a basic potassium persulphate solution at 100 °C for 1h to 
remove external organic N. The cleaned samples are rinsed in deionized water and 
dried overnight at 60 °C. 

Conversion of foraminifera-bound N to nitrate. Foraminifera (3-6 mg per 
sample) are weighed into a previously combusted glass vial and dissolved in 4N 
HCl (40-60 il per sample). To convert the released organic N to nitrate, basic 
potassium persulphate oxidizing solution is added to each vial and to vials contain- 
ing organic standards and procedural blanks, and the vials are then autoclaved for 
1h on a slow-vent setting (1.5h including warm-up and cool-down times). To 
lower the N blank associated with the oxidizing solution, the potassium persul- 
phate is recrystallized two to three times. At the time of processing, 1 g NaOH and 
1g potassium persulphate are dissolved in 100 ml of deionized water. Organic 
standards are used to constrain the 5'°N of the persulphate reagent blank. The two 
organic standards used here are mixtures of 6-aminocaproic acid and glycine. A 
minimum of 10 organic standards and 3-5 blanks are analysed per batch of 
samples, allowing for a correction for the persulphate blank. We used 1 ml of 
persulphate reagent for the blanks, oxidation standards and foraminifera samples. 
Determination of N content. To determine sample N content, we measure nitrate 
concentration in the oxidation solutions after autoclaving. The nitrate analysis is 
done by reduction to nitric oxide using vanadium(m1) followed by chemilumin- 
escence detection**. The blank is also quantified in this way. The G. ruber and 
G. sacculifer samples have average N content of 3-4 mol N per gram of sample, 
yielding nitrate concentrations in the oxidation solutions of 10-20 1M, whereas 
the blank concentration ranges between 0.3 and 0.7 uM. 

Denitrifier method. A detailed explanation of the denitrifier method can be 
found in ref. 34. Before adding the foraminifera samples to the bacteria, the sample 
solution is acidified to pH 3-6. The denitrifier Pseudomonas chlororaphis was used 
for this work. Normally, 5 nmol samples are added to 1 ml of bacterial concentrate 
after the degassing. Oxidation standards, as well as replicate analyses of nitrate 
reference material IAEA-N3 (with 5'°N of 4.7%o) and a bacterial blank, are also 
measured. The IAEA-N3 standards are used to monitor the bacterial conversion 
and mass spectrometry, and the oxidation standards are used for the final correc- 
tion of the data. If possible, samples are oxidized in duplicate, and all oxidized 
samples are analysed by the denitrifier method in duplicate at least. The reported 
error is the standard deviation (1c) estimated from the means of separate oxida- 
tions of cleaned foraminiferal material. 

Age model. We use the age model in refs 20,35. This age model is based on 
radiocarbon dating for the past 21.6 kyr. For the rest of the record, the age model 
is based on the alignment between the 5'°O record of G. ruber and the LRO4 
benthic stack reference curve”. 

Bulk sediment 5'°N analysis. The total N content of the sediment was analysed as 
N) using a Thermo Fisher Series 1112 elemental analyser coupled with a Thermo 
Fisher Delta V Plus mass spectrometer at ETH Zurich. Between 40 and 60 mg of 
sediment was analysed. In-house standards of atropine and peptone were measured 
in the same runs, and the final corrections were based on the peptone standard, 
which has been referenced to international reference materials. Standard deviations 
for both standards were <0.2%o. 

X-ray fluorescence scanning. Sediment Zr/Al X-ray fluorescence (XRF) count 
ratios were obtained with an Avaatech XRF Core Scanner at MARUM, University 
of Bremen. Data were obtained at a resolution of 1 cm over an area of 1.2 cm* 
directly at the split-core surface of the archive half, with different settings for light 
elements such as Al (10kV, 20s and 150mA) and heavy elements such as Zr 
(50kV, 20s and 800mA). The core surface was covered with a 4-~m-thick 


SPEXCerti Prep Ultralenel foil to avoid contamination of the XRF measurement 
unit. The core scanner includes a Canberra X-PIPS Silicon Drift Detector (SDD; 
Model SXD_ 15C-150-500) with 150-eV X-ray resolution, a Can- 
berra Digital Spectrum Analyzer DAS 1000 and an Oxford Instruments 100W 
Neptune X-ray tube with rhodium target material. Raw data spectra were pro- 
cessed using the WIN AXIL package from Canberra Eurisys. 

To bridge coring gaps and hiatuses, cores from all three parallel holes (A, B and 
C) were used, and a new composite depth was derived from the XRF data. Because 
cores from hole C had been frozen and re-thawed, we used this hole only where 
material from the other two holes was unavailable. The higher water content of 
Site 658C resulting from the different treatment affected the Al measurements, as 
is commonly observed. We therefore adjusted the Zr/Al ratios from Site 658C by 
regression to Zr/Al from Site 658B for a section with overlapping data (Sup- 
plementary Fig. 3). 

The age model for the record from Site 658 was derived by transferring the 
published benthic 5'*O record’” onto the new composite depth and aligning the 
record to the same data as contained in the LR04”° benthic stack (Supplementary 
Fig. 3). For the past 15 kyr, the age model was further refined by aligning XRF Ca 
counts from Site 658A and Site 658B to percentage CaCO; data from radiocarbon- 
dated Site 658C**’, and for the period between 20 and 130 kyr ago, additional tie- 
points were derived by matching Zr/Al to the humidity index of core GeoB7920-2°". 

Because Zr from zircon minerals is dominantly present in the coarse silt frac- 

tion, whereas Al is most abundant in clay minerals, the Zr/Al ratio is interpreted to 
reflect grain size (Supplementary Fig. 3). Similarly, the Zr/Rb ratio has been shown 
to covary with grain size in this region*””’, but the low signal-to-noise ratio of Rb in 
our XRF measurements led us to use Al instead. The dust deposited at Site 658 is 
mainly transported by the northeast trade winds, and its dominant grain type is 
coarse silt*’. Increasing Zr/A] ratios at Site 658 should therefore reflect increasing 
aeolian transport of Saharan dust. 
Map of excess phosphorus and wind velocities. The excess phosphorus map 
shown in Fig. 1 uses World Ocean Atlas data and the software Ocean Data 
View’. Winds were plotted using the IRI Climate and Society Map Room 
(http://iridl.1deo.columbia.edu/maproom). 
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Groundwater drawn daily from shallow alluvial sands by millions of 
wells over large areas of south and southeast Asia exposes an esti- 
mated population of over a hundred million people to toxic levels of 
arsenic’. Holocene aquifers are the source of widespread arsenic 
poisoning across the region’. In contrast, Pleistocene sands depo- 
sited in this region more than 12,000 years ago mostly do not host 
groundwater with high levels of arsenic. Pleistocene aquifers are 
increasingly used as a safe source of drinking water* and it is there- 
fore important to understand under what conditions low levels of 
arsenic can be maintained. Here we reconstruct the initial phase of 
contamination of a Pleistocene aquifer near Hanoi, Vietnam. We 
demonstrate that changes in groundwater flow conditions and the 
redox state of the aquifer sands induced by groundwater pumping 
caused the lateral intrusion of arsenic contamination more than 120 
metres from a Holocene aquifer into a previously uncontaminated 
Pleistocene aquifer. We also find that arsenic adsorbs onto the aqui- 
fer sands and that there is a 16-20-fold retardation in the extent of 
the contamination relative to the reconstructed lateral movement of 
groundwater over the same period. Our findings suggest that arsenic 
contamination of Pleistocene aquifers in south and southeast Asia as 
a consequence of increasing levels of groundwater pumping may 
have been delayed by the retardation of arsenic transport. 

This study reconstructs the initial phase of contamination of an 
aquifer containing low levels of arsenic (low-As) in the village of Van 
Phuc, located 10 km southeast of Hanoi on the banks of the Red River. A 
key feature of the site is the juxtaposition of a high-As aquifer upstream 
of a low-As aquifer in an area where pumping for the city of Hanoi has 
dominated lateral groundwater flow for the past several decades 
(Fig. 1a). Many residents of the village of Van Phuc still draw water 
from their 30-50-m-deep private wells. In the western portion of the 
village, the wells typically contain less than 10 1g of As per litre of water 
and therefore meet the World Health Organization guideline for As in 
drinking water, whereas As in the groundwater from most wells in 
eastern Van Phuc exceeds this guideline by a factor of 10-50 (ref. 5). 

Drilling and sediment dating in the area has shown that low-As 
groundwater is drawn from orange-coloured sands deposited over 
12,000 years ago, whereas high-As groundwater is typically in contact 
with grey sands deposited less than 5,000 years ago*’. We examined to 
what extent the boundary between the low-As and high-As aquifers of 
Van Phuc has shifted in response to groundwater withdrawals in 
Hanoi. This large-scale perturbation spanning several decades has 
implications for low-As aquifers throughout Asia that are vulnerable 
to contamination owing to accelerated groundwater flow. 

The collection of sediment cores and the installation of monitoring 
wells was concentrated along a transect trending southeast to north- 
west that extends over a distance of 2.2 km from the bank of the Red 


River (Fig. 1b). Groundwater heads, and therefore the groundwater 
velocity field, within Van Phuc respond rapidly to the daily and seasonal 
cycles in the water level of the river (Supplementary Information). 
Before large-scale groundwater withdrawals, rainfall was sufficient to 
maintain groundwater discharge to the river, as is still observed else- 
where along the Red River®. In Van Phuc, however, the groundwater 
level was on average 40 cm below that of the water level of the Red River 
in 2010-11 and the hydraulic gradient nearly always indicated flow 
from the river into the aquifer. The reversal of the natural head gradient 
is caused by the large depression in groundwater level centred 10 km to 
the northwest that induces groundwater flow along the Van Phuc 
transect from the river towards Hanoi (Fig. 1a). This perturbation of 
groundwater flow is caused by massive pumping for the municipal 
water supply of Hanoi’”’, which nearly doubled from 0.55 million to 
0.90 million cubic metres per day between 2000 and 2010 owing to the 
rapid expansion of the city (Supplementary Fig. 1). 

A change in the colour of a clay layer capping sandy sediment along 
the transect defines a geological boundary between the two portions of 
the Van Phuc aquifer. Up to a distance of 1.7 km from the river bank, 
the clay capping the aquifer is uniformly grey with the exception of a 
thin brown interval at the very surface (Fig. 2b). In contrast, a readily 
identifiable sequence of highly oxidized bright yellow, red and white 
clays was encountered between 12m and 17 m depth at all drill sites 
along the transect beyond a distance of 1.7 km from the river bank. 
This oxidized clay layer is probably a palaeosol dating to the last sea- 
level low-stand about 20,000 years ago”. 

The colour of aquifer sands below the upper clay layer also changes 
markedly along the Van Phuc transect. Sand colour in fluvio-deltaic 
deposits is controlled primarily by the extent to which Fe(m) has been 
reduced to Fe(i1) by the decomposition of organic carbon’. Up to a 
distance of 1.6km from the river bank, sandy drill cuttings within the 
20-40 m depth range are uniformly grey. The predominance of orange 
sands beyond 1.6 km indicates oxidation during the previous sea-level 
low-stand. After the sea level rose back to its current level, the nature of 
the remaining organic carbon precluded a new cycle of Fe(11) reduction”. 

Independently of sediment colour, the calcium (Ca) content of sand 
cuttings collected while drilling along the Van Phuc transect confirms 
that a geological boundary extends to the underlying aquifer sands. 
Within the southeastern portion of the aquifer that is not capped by the 
presumed palaeosol, X-ray fluorescence measurements indicate Ca 
concentrations of over 2,000 mg Ca per kg of sand in cuttings to a 
depth of 30 m (Fig. 2a). The groundwater in this portion of the aquifer 
is supersaturated with respect to calcite and dolomite’, suggesting that 
authigenic precipitation is the source of Ca in the grey drill cuttings, as 
previously proposed elsewhere” (Supplementary Fig. 2). At a distance 
of 1.7 km from the river and further to the northwest, instead, the Ca 
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content of orange sand cuttings systematically remains less than 100 mg 
Ca per kg and the groundwater is undersaturated with respect to calcite 
and dolomite. Unlike surficial shallow grey clays, the Ca content of the 
presumed palaeosol is also very low (<100 mg Ca per kg) and consist- 
ent with extensive weathering. 

The redox state of the aquifer has a major impact on the composi- 
tion of groundwater in Van Phuc, as reported elsewhere in Vietnam” 
and across south and east Asia more generally’. High but harmless 
Fe(11) concentrations in groundwater (10-20 mg per litre) associated 
with grey reducing sediments are apparent to residents of eastern Van 
Phuc as an orange Fe(I) precipitate that forms in their water upon 
exposure to air (Supplementary Fig. 3). In contrast, the high and toxic 
concentrations of As in groundwater at 20-30m depth within the 
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Figure 1 | Map of the Hanoi area extending south to the study site. 

a, Location of the village of Van Phuc in relation to the cone of depression 
formed by groundwater pumping for the municipal water supply of Hanoi 
(white contours, adapted from ref. 10). Urbanized areas are shown in grey; 
largely open fields are shown in green. b, Enlarged view of Van Phuc (box shows 
location in a) from Google Earth showing the location of the transect along 
which groundwater and sediment were collected, with tickmark labels 
indicating distance from the Red River bank in kilometres. Symbol colour 
distinguishes the uniformly grey Holocene aquifer (red), the Pleistocene aquifer 
contaminated with As (yellow), the Pleistocene aquifer where the groundwater 
conductivity and dissolved inorganic carbon concentrations are high but As 
concentrations are not (green), and the Pleistocene aquifer without indication 
of contamination (blue), all within the 25-30-m depth interval. Three white 
asterisks identify the wells that were used to determine flow direction. Image 
copyright 2012 Digital Globe Google Earth. c, Rose diagram frequency plot of 
the head gradient direction based on data collected at 5-min intervals (numbers 
indicate the number of observations) from these three wells from September 
2010 to June 2011. 


same portion of the transect, ranging from 200 yg per litre near the 
river to levels as high as 600 1g per litre at 1.2-1.6 km from the river 
bank, are invisible (Fig. 2c). The groundwater in contact with Pleisto- 
cene sands in northwestern Van Phuc is also anaerobic but contains 
less than 0.5 mg Fe(11) per litre and less than 10 jig As per litre and 
shows little indication of organic carbon mineralization compared to 
the Holocene aquifer (Supplementary Fig. 4). 

The Pleistocene portion of the Van Phuc aquifer adjacent to the 
Holocene sediment is not uniformly orange or low in As. Of particular 
interest is a layer of grey sand at 25-30m depth extending to the 
northwest at a distance of 1.7-1.8km from the river bank (Fig. 2b). 
The intercalation of grey sand between orange sands above and below, 
combined with the low Ca content of sand cuttings within this layer, 
indicate that it was deposited during the Pleistocene and therefore 
until recently oxidized and orange in colour. Within the portion of 
the Pleistocene aquifer that became grey and is closest to the geological 
boundary, groundwater As concentrations are therefore presumed to 
have been originally very low (<5 pg per litre). Actual As concentra- 
tions of 100-500 1g per litre, as high as in the adjacent Holocene 
aquifer, indicate contamination extending over a distance of about 
120 m into the Pleistocene aquifer (Fig. 3a). 

A subset of the transect wells was sampled in 2006 and analysed for 
tritium (*H) as well as noble gases in order to measure groundwater 
ages and determine the rate of As intrusion into the Pleistocene aqui- 
fer. Atmospheric nuclear weapons testing in the 1950s and 1960s is the 
main source of °*H that entered the hydrological cycle’’. The distri- 
bution of °H indicates that only groundwater in the southeastern high- 
As portion of the aquifer contains a plume of recharge dating from the 
1950s and later. Concentrations of *He, the stable decay product of 3H, 
were used to calculate groundwater ages for eight wells in the 24-42-m 
depth range with detectable levels of 7H. In 2006, the oldest water dated 
by the 7#H-*He method (Supplementary Fig. 5) was sampled at a dis- 
tance of 1.6 km from the river, which is the most northwestern location 
along the transect where the aquifer is uniformly grey (Fig. 2b, d). 
Younger ages of 15 years and 17 years were measured closer to the 
river at 1.3 km and 1.5 km, respectively. Concentrations of 3H, ground- 
water “H-*He ages, and hydraulic head gradients consistently indicate 
that the Holocene aquifer has been recharged by the river from the 
southeast within the past few decades. 

Drilling and geophysical data indicate that the main groundwater 
recharge area extends from the centre of the Red River to the inland 
area where a surficial clay layer thickens markedly, that is, from 100 m 
southeast to 300 m northwest of the river bank (Supplementary Fig. 6). 
The relationship between groundwater ages and travel distance from 
the recharge area implies accelerating flow drawn by increased Hanoi 
pumping (Supplementary Fig. 7). A simple transient flow model 
for the Van Phuc aquifer yields average advection rates of 38 myr | 
and 48m yr | towards Hanoi since 1951 and 1971, respectively (Sup- 
plementary Discussion). According to these two pumping scenarios, 
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Figure 2 | Contoured sections of sediment and water properties based on 
data collected between 1.3 km and 2.0 km from the Red River bank. The 
location and number of samples indicated as black dots varies by type of 
measurement. a, Concentration of Ca in sand cuttings measured by X-ray 
fluorescence. Also shown are the boundaries separating the two main aquifers 
and the palaeosol overlying the Pleistocene aquifer. ‘2000 labels the contour for 
2,000 mg Ca per kg. b, Difference in diffuse spectral reflectance between 530 nm 
and 520 nm, indicative of the colour of freshly collected drill cuttings'’. The 
contour labels correspond to the percentage difference in reflectance shown by 
the colour scale. c, Concentrations of As in groundwater collected in 2006 with 
the needle sampler and in 2011 by monitoring wells along the transect. “10° 
labels the contour for the WHO guideline, 10 ug As per litre. d, Groundwater 
ages relative to recharge determined by *H-*He dating of groundwater samples 
collected from a subset of the monitoring wells in 2006. The portion of the 
Pleistocene aquifer that became reduced and where As concentrations 
presumably increased over time is located within the large white arrow pointing 
in the direction of flow. The plot was drawn with Ocean Data View 
(http://odv.awi.de/). 


groundwater originating from the Holocene portion of the aquifer was 
transported 2,000-2,300 m into the Pleistocene sands by 2011, when the 
transect was sampled for analysis of As and other groundwater constituents. 

The sharp decline in As concentrations between 1.60km and 
1.75 km from the river bank indicates that migration of the As front 
across the geological boundary was retarded by a factor of 16 to 20 
relative to the movement of the groundwater (Fig. 3a). Without 
retardation, attributable to As adsorption onto aquifer sands, the entire 
Pleistocene aquifer of Van Phuc would already be contaminated. The 
retardation is derived from several decades of perturbation and is at the 
low end of previous estimates by other methods, typically measured 
within days to weeks'’**, and therefore predicts greater As mobility 
than most previous studies. The retardation measured in Van Phuc 
integrates the effect of competing ions typically present at higher con- 
centrations in the Holocene aquifer (Supplementary Fig. 4) as well as 
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Figure 3 | Distribution of arsenic and dissolved organic carbon in 
groundwater within the 25-30-m depth interval along the Van Phuc 
transect. Symbols are coloured according to the classification in Fig. 1. Grey and 
orange shading indicates the extent of the grey Holocene aquifer and the portion 
of the Pleistocene aquifer that is still orange, respectively. The intermediate area 
without shading indicates the portion of the Pleistocene aquifer that became grey. 
Shown as dotted lines are predicted As concentrations bracketing the 
observations with retardation factors R of 16 and 20 and an average advection 
velocity of 43 myr | over the 50 years preceding the 2011 sampling 
(Supplementary Discussion). a, Also shown are predicted concentrations for As 
assuming retardation factors of 1, 5 and 40 and the same average rate of advection. 
b, For visual reference, predicted dissolved organic carbon concentrations are 
shown as dotted lines according to the same advection velocity and retardation 
factors of 16, 20 and 40, assuming there was no detectable dissolved organic 
carbon in the Pleistocene aquifer before the perturbation. 


the impact of Fe oxyhydroxide reduction. However, the extent to which 
contamination was caused by either As transport from the adjacent 
Holocene aquifer or reductive dissolution of Fe(mm) oxyhydroxides and 
in situ As release to groundwater cannot be determined from the 
available data (Supplementary Fig. 8). 

The sharp drop in dissolved organic carbon concentrations across 
the geological boundary from 9 mg per litre to about 1 mg per litre 
indicates rapid organic carbon mineralization coupled to the reduction 
of Fe(1I1) oxyhydroxides and explains the formation of a plume of grey 
sands within the Pleistocene aquifer (Fig. 3b). On the basis of a stoi- 
chiometric Fe/C ratio of 4 (ref. 15), the dissolved organic carbon sup- 
plied by flushing the aquifer 30 times with groundwater from the 
Holocene aquifer would be required to turn Pleistocene sands from 
orange to grey by reducing half of their 0.1% reactive Fe(II) oxyhydr- 
oxide content*’, assuming a porosity of 0.25. Given that groundwater 
was advected over a distance of 2,000-2,300 m across the geological 
boundary over the past 40-60 years, we would predict that the plume of 
grey sands extends 65-75 m into the Pleistocene aquifer. This is some- 
what less than is observed (Fig. 3), possibly due to additional reduction 
by H, advected from the Holocene portion of the aquifer’. The Van 
Phuc observations indicate that dissolved organic carbon advected 
from a Holocene aquifer can be at least as important for the release 
of As to groundwater as autochthonous organic carbon’*”*”’, 

Contamination of Pleistocene aquifers has previously been invoked 
in the Red River and the Bengal basins'’'*”*, but without the benefit of 
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a well-defined hydrogeological context. The Pleistocene aquifer of Van 
Phuc was contaminated under the conducive circumstances of accel- 
erated lateral flow. Although downward groundwater flow and there- 
fore penetration of As will typically be slower, the Van Phuc findings 
confirm that the vulnerability of Pleistocene aquifers will depend on 
the local spatial density of incised palaeo-channels that were subse- 
quently filled with Holocene sediments’. Owing to retardation, con- 
centrations of As ina Pleistocene aquifer will not increase suddenly but 
over timescales of decades even in the close vicinity of a Holocene 
aquifer. This is consistent with the gradual increase in groundwater 
As concentrations documented by the few extended time series avail- 
able from such a vulnerable setting”. However, concentrations of As 
could rise more rapidly where flow accelerates beyond the rate docu- 
mented in Van Phuc, closer to Hanoi for instance. 


METHODS SUMMARY 


A total of 41 wells were installed in Van Phuc in 2006-11. The water levels of the 
river and in the wells were recorded from September 2010 to June 2011 using 
pressure transducers and adjusted to the same elevation datum after barometric 
corrections. The magnitude and direction of the head gradient within the 25-30-m 
depth interval was calculated from water level measurements in three wells 
(Fig. 1b). In 2006, a subset of the wells was sampled for noble gas and tritium 
(7H) analysis at a high flow rate using a submersible pump to avoid degassing. The 
samples were analysed by mass spectrometry in the Noble Gas Laboratory at ETH 
Zurich. *H_ concentrations were determined by the *He ingrowth method”. 
Groundwater As, Fe and Mn concentrations measured by high-resolution induc- 
tively coupled plasma mass spectrometry at LDEO represent the average for 
acidified samples collected in April and May 2012. Further details are provided 
in the Supplementary Information. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Drilling. A first set of 25 wells, including two nests of nine and ten wells tapping 
the depth range of the Holocene and Pleistocene aquifers, respectively, were 
installed in Van Phuc in 2006 (ref. 6). Another 16 monitoring wells were installed 
between December 2009 and November 2011. Three additional holes were drilled 
to collect cuttings without installing a well. All holes were drilled by flushing the 
hole with water through a rotating drill bit. 

Needle sampling. In 2006, drilling was briefly interrupted at seven sites to increase 
the vertical resolution of both sediment and groundwater data using the needle 
sampler’'. Groundwater was pressure-filtered under nitrogen directly from the 
sample tubes. As a measure of the pool of mobilizable As, sediment collected with 
the needle sampler was subjected to a single 24-hour extraction in a 1M PO, 
solution at pH 5 (ref. 32). 

Water level measurements. A theodolite elevation survey of the well and river 
measurement points were carried out in June 2010 by a surveying team from 
Hanoi University of Science. Water level data in both the wells and river were 
recorded using Solinst Levelogger pressure transducers. A barometric pressure 
logger was also deployed at the field site. Water level and barometric data were 
recorded at 5-min intervals and all water level data was barometrically corrected. 
The barometrically corrected water level data from each logger was then adjusted 
to the surveyed elevation of their respective measurement point so that all of the 
data was referenced to the same elevation datum. 

Groundwater flow. The magnitude and direction of the head gradient within the 
25-30-m depth of the aquifer at Van Phuc was calculated using the barometrically 
adjusted and survey-referenced water level data collected at 5-min intervals from 
September 2010 to June 2011 in three wells located near the centre of the transect 
(Fig. 1b). A least-squares fit of a plane was calculated for each set of simultaneous 
water levels at these three wells, and from this set of planes the magnitude and 
direction of the head gradient at 5-min intervals was directly computed. 
Groundwater analysis. In 2006, a subset of the monitoring wells was sampled 
along a vertical transect for noble gas and tritium (°H) analysis. After purging the 
wells, the samples were taken using a submersible pump. To avoid degassing of the 
groundwater owing to bubble formation during sampling the water was pumped 
at high rates to maintain high pressure. The samples for noble gas and °H analysis 
were put into copper tubes and sealed gastight using pinch-off clamps. All samples 
were analysed for noble gas concentrations and the isotope ratios *He/*He, 
?°Ne/**Ne and *°Ar/*°Ar using noble gas mass spectrometry in the Noble Gas 
Laboratory at ETH Zurich****. 7H concentrations were determined by the *He 
ingrowth method using a high-sensitivity compressor-source noble gas mass 
spectrometer. 7H-*He ages were calculated according to the equations listed in 
ref, 34, taking into account an excess air correction. When comparing the recon- 
structed original *H content of each sample as a function of *H-*He age with the 
°H input function for south and southeast Asia (Supplementary Fig. 5), most 
samples follow the trend expected from simple plug flow**”. 

Several days before analysis by high-resolution inductively coupled plasma 
mass spectrometry at LDEO, groundwater was acidified to 1% Optima HCl in 
the laboratory”*. This has been shown to re-dissolve entirely any precipitates that 
could have formed*’. In most cases, the difference between duplicates was within 
the analytical uncertainty of ~5%. With the exception of needle-sample data and 
the nest of ten wells in the Holocene portion of the aquifer, which had to yield to 


construction, groundwater As, Fe and Mn concentrations reported here represent 
the average for samples collected without filtration in April and May 2012. Ground- 
water data from 2006 were previously reported in refs 6 and 31. 

Dissolved organic carbon samples were collected in 25-ml glass vials combusted 
overnight at 450 °C and acidified to 1% HCl at the time of collection. Dissolved 
inorganic carbon samples were also collected in 25-ml glass vials with a Teflon 
septum but were not acidified. Both dissolved organic carbon (“NPOC”) and 
dissolved inorganic carbon (by difference of “T'C-NPOC”) were analysed on a 
Shimadzu TOC-V carbon analyser calibrated with K phthalate standards. 

Ammonium samples were collected in polypropylene bottles after passing 
through 0.45 ,1m cellulose acetate membrane filters and preserved by acidifying 
to pH <2 with HNO3. NH,” concentrations were analysed on a spectrophot- 
meter (UV-3101, Shimadzu) at a wavelength of 690 nm after forming a complex 
with nitroferricyanide*’. 

Methane (CH,) samples were filled up to about half of the pre-vacuumed glass 
vials and immediately frozen in dry ice. The analyses were performed no longer 
than ten days after sampling. Headspace CH, in the vials was measured on a 
Shimadzu 2014 gas chromatograph with a Porapak T packed column“. 
Sediment analysis. As a measure of the redox state of Fe in acid-leachable oxy- 
hydroxides, the diffuse spectral reflectance spectrum of cuttings from all sites was 
measured on samples wrapped in Saran wrap and kept out of the sun within 12 
hours of collection using a Minolta 1600D instrument’’. Starting in 2009, the 
coarse fractions of the drill cuttings were analysed by X-ray fluorescence for a 
suite of elements including Ca using an InnovX Delta instrument. The drill cut- 
tings were resuspended in water several times to eliminate the overprint of Ca- 
enriched clays contained in the recycled water used for drilling. The washed 
samples were run as is, without drying or grinding to powder. Analyses of NIST 
reference material SRM2711 (28,800 + 800mg Ca per kg) analysed by X-ray 
fluorescence at the beginning and end of each run averaged 30,200 + 400 mg Ca 
per kg (n = 16). 


31. van Geen, A. et al. Comparison of arsenic concentrations in simultaneously- 
collected groundwater and aquifer particles from Bangladesh, India, Vietnam, 
and Nepal. Appl. Geochem. 23, 3244-3251 (2008). 

32. Zheng, Y. etal. Geochemical and hydrogeological contrasts between shallow and 
deeper aquifers in two villages of Araihazar, Bangladesh: implications for deeper 
aquifers as drinking water sources. Geochim. Cosmochim. Acta 69, 5203-5218 
(2005). 

33. Frei, F. Groundwater dynamics and arsenic mobilization near Hanoi (Vietnam) 
assessed using noble gases and tritium. Diploma thesis, ETH Zurich (2007). 

34. Klump,S. et a/. Groundwater dynamics and arsenic mobilization in Bangladesh 
assessed using noble gases and tritium. Environ. Sci. Technol. 40, 243-250 
(2006). 

35. Stute, M. et a/. Hydrological control of As concentrations in Bangladesh 
groundwater. Wat. Resour. Res. 43, W09417 (2007). 

36. Cheng, Z., Zheng, Y., Mortlock, R. & van Geen, A. Rapid multi-element analysis of 
groundwater by high-resolution inductively coupled plasma mass spectrometry. 
Anal. Bioanal. Chem. 379, 512-518 (2004). 

37. van Geen, A. etal. Monitoring 51 deep community wells in Araihazar, Bangladesh, 
for up to 5 years: implications for arsenic mitigation. J. Environ. Sci. Health A 42, 
1729-1740 (2007). 

38. Koroleff, F. In Methods of Seawater Analysis (ed. Grasshoft, K.) 126-133 (Chemie, 
1974). 


©2013 Macmillan Publishers Limited. All rights reserved 


| sid ial Be 


doi:10.1038/nature12490 


Non-chondritic sulphur isotope composition of the 


terrestrial mantle 


J. Labidi', P. Cartigny' & M. Moreira” 


Core-mantle differentiation is the largest event experienced by a 
growing planet during its early history. Terrestrial core segregation 
imprinted the residual mantle composition by scavenging sidero- 
phile (iron-loving) elements such as tungsten, cobalt and sulphur. 
Cosmochemical constraints suggest that about 97% of Earth’s sul- 
phur should at present reside in the core’, which implies that the 
residual silicate mantle should exhibit fractionated **S/°’S ratios 
according to the relevant metal-silicate partition coefficients’, together 
with fractionated siderophile element abundances. However, Earth’s 
mantle has long been thought to be both homogeneous and chondritic 
for *4S/°?S, similar to Canyon Diablo troilite*, as it is for most side- 
rophile elements. This belief was consistent with a mantle sulphur 
budget dominated by late-accreted chondritic components. Here we 
show that the mantle, as sampled by mid-ocean ridge basalts from the 
south Atlantic ridge, displays heterogeneous **S/*’S ratios, directly 
correlated to the strontium and neodymium isotope ratios *’Sr/*°Sr 
and '**Nd/'“*Nd. These isotope trends are compatible with binary 
mixing between a low-*‘S/*’S ambient mantle and a high-**S/*7S 
recycled component that we infer to be subducted sediments. The 
depleted end-member is characterized by a significantly negative 
5°*S of —1.28 + 0.33%o that cannot reach a chondritic value even 
when surface sulphur (from continents, altered oceanic crust, sedi- 
ments and oceans) is added. Such a non-chondritic *“S/*’S ratio for 
the silicate Earth could be accounted for by a core-mantle differ- 
entiation record in which the core has a *“S/°’S ratio slightly higher 
than that of chondrites (8°“S = +0.07%o). Despite evidence for late- 
veneer addition of siderophile elements (and therefore sulphur) after 
core formation, our results imply that the mantle sulphur budget 
retains fingerprints of core-mantle differentiation. 

Earlier reports of mid-ocean ridge basalts (MORBs) 5*4S (where 
54S = [(*4S/°7S) sampte/(P4S/°*S) cpr — 1] X 1,000 and CDT is Canyon 
Diablo troilite) yielded values statistically indistinguishable from the 
reported chondrite average of 0.04 + 0.31% (refs 3-6), consistent with 
a late-veneer origin for sulphur in the mantle. This view has been 
recently questioned in a study using improved sulphur-extraction tech- 
niques’. Worldwide MORBs exhibit exclusively negative 5°“S ranging 
down to —1.9%o, with an approximately 2%bo variability’. These observa- 
tions raise questions as to whether the terrestrial mantle is chondritic in 
sulphur isotopes, but the mechanisms controlling MORB *S/S vari- 
ability remain unclear. 

To address these issues, we investigated the sulphur isotope com- 
position of 23 glasses dredged on the South Atlantic ridge between 40° 
and 55° S. Previous radiogenic isotopes*"®, noble gases'””’, volatiles’’, 
major-element"* and trace-element’> measurements on these basalts 
illustrate interactions of the Shona—Discovery hotspots with the ridge. 
Several of the typical mantle end-members feed the mantle source of 
the two plumes, including HIMU (‘high-w’ where w= 2387 /24pb), 
LOMU (‘low-’), and enriched-mantle components. The samples ana- 
lysed were chosen to reflect this geochemical variability and thus offer an 
opportunity to address **S/°*S variations with respect to these mantle 
heterogeneities. 


In our samples, sulphur occurs only in its reduced form, as commonly 
found in MORBs'*"*, and the sulphur content, from 642 to 1,388 parts 
per million (p.p.m.), correlates with the FeO content (Supplementary 
Fig. 3A), illustrating magmatic sulphide saturation’. The eruption depth 
of our samples precludes any significant sulphur degassing”, restricting 
the sulphur isotope discussion to primary magmatic considerations. 
5°4S varies between — 1.80%o and +1.05%o, whereas both the quantify- 
ing mass-independent signatures A**S and A°°S (where A*’S = 8°°S - 
1,000 X [(8**S/1,000 + 1)°°!°— 1] and A*°S = 8°°S - 1,000 x [(8**S/ 
1,000 + 1)'8°? — 1}), are homogeneous and equal within uncertainty 
to those of Canyon Diablo troilite (Supplementary Table 1). Besides, 
5°4S is remarkably correlated to source enrichment proxies such as 
876r/8°Sr or “3Nd/'“4Nd (Fig. 1). Seawater sulphate (54S = +21%o) 
assimilation by erupted basalts could result in an increase in 8's; it 
would, however, scatter or erase any correlations between 5°“S and 
tracers that remain insensitive to seawater incorporation (such as 
“8Nd/'*4Nd). It would also result in a relationship between 5°*S and 
proxies of seawater incorporation such as chlorine contents or Cl/K 
ratios, which is again not the case (Supplementary Fig. 4). Correlated 
5°4S and radiogenic isotope enrichments thus reflect the variability of 
the MORB sources. 

All the samples are saturated with immiscible sulphide’®, and the 
extremely low osmium contents in the most primitive MORBs probably 
indicates that they were already sulphide-saturated during melting’®. 
Such a saturation mechanism prevents us from constraining the sul- 
phur contents of the mantle source, and because the fractionation factor 
between dissolved sulphur and magmatic sulphide might be significant, 
it raises the question of whether basalts can preserve the **S/°’S ratios 
of their mantle source. Those basalts investigated have such distinct 
differentiation and melting histories'*’’ that the amount of sulphide 
left during both melting and differentiation is variable'®'®. If the high- 
temperature sulphur isotope fractionation during magmatic sulp- 
hide exsolution were significant, it would have erased the observed 
8°45—87Sr/*°Sr-'3Nd/!“*Nd trends (Fig. 1). The lack of correlation 
between 8°*S and sulphide segregation proxies indicates a fractionation 
factor, Osulph-meln Of between 1.0000 and 0.9995 + 0.0005, at most 
(Supplementary Information). The 5*S of basalts hence provides direct 
information on mantle source composition, here displaying an average 
value of —0.80 + 0.58%o (1a). 

The most **S-depleted samples correspond to depleted MORB and 
*4S-enriched samples are related to the Discovery plume, whereas the 
Shona and LOMU basalts are of intermediate composition. Interest- 
ingly, samples from Shona and Discovery display similar *He/*He and 
*!Ne/*’Ne (refs 11 and 12) but have different °“S enrichments, suggest- 
ing that these isotopic systems are decoupled. Such a lack of correlation 
between noble gases and S isotopes suggests that the primitive mantle 
is sulphur-poor compared to the recycled components. Hence, char- 
acterization of the primordial sulphur cannot be directly achieved 
from the present data set. 

Our results can be explored in two ways. First, the south Atlantic 
depleted mantle has an average 34S of —1.28 + 0.33%o (error is lo, 
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Figure 1 | 5°*S versus ®’Sr/*°Sr and '*7Nd/'4Nd. Strontium and neodymium 
isotope data are from ref. 8. Sulphur-isotope uncertainties are estimated on the 
basis of replicate analysis for all samples, and are all within symbol sizes. The 
samples define a binary mixing relationship between a depleted-mantle and an 
enriched-mantle endmember. Mixing trends with sub-continental lithospheric 
mantle (SCLM) and lower continental crust (LCC) are also shown, illustrating that 
mixing with sediments account better for the geochemical composition of our 
samples. a, 5°"S versus *’Sr/*°Sr. b, 5°45 versus !43Nd/!4Nd. 8°45 is defined in the 
main text; the CDT standard is used. R, correlation coefficient. Data used for the 
mixing calculation is as follows, from refs 22-27 and Supplementary ref. 37. For 
depleted mantle, 8°*S = —1.5%o, °”Sr/*°Sr = 0.7025, '?Nd/“*Nd = 0.513156, 
Sr = 11.3 p.p.m., Nd = 1.12 p.p.m. and S = 200 p.p.m. For LCC, 84S = +3%o, 
87S /°°Sr = 0.7080, '?Nd/'*Nd = 0.511600, Sr = 348 p.p.m., Nd = 11 p.p.m. 
and S = 408 p.p.m. For SCLM, 5*4S = +3%o, ®”Sr/*°Sr = 0.7100, 

'BNd/'4Nd = 0.511663, Sr = 49 p.p.m., Nd = 2.67 p.p.m. and $ = 157 p.p.m. 
For the sediment, 8S = + 10%o, *’Sr/*°Sr = 0.7203, "°Nd/'*Nd = 0.511170, 
Sr = 327 p.p.m., Nd = 27 p.p.m. and S = 5,700 p.p.m. Markers on the fits are 
separated by 0.1%, 1.0% and 4.0% for sediment, LCC and SCLM, respectively. 


n= 6), defining a mean value consistently distinct from chondrites. 
We note that an extrapolation of Sr-Nd-S isotope trends to most 
depleted compositions would lead to a depleted-mantle endmember 
as low as — 1.80%bo (Fig. 1). Second, the nature of classical mantle end- 
members can be addressed. Whereas it is commonly accepted that 
HIMU features reflect the occurrence of subducted oceanic crust in 
plume sources”®, the enriched-mantle and LOMU cases remain much 
debated. In the south Atlantic mantle, a significant contribution of 
continental material has been invoked to account for the LOMU end- 
member, including either delaminated subcontinental lithospheric 
mantle*’ or lower continental crust’®. In S-Sr and S—Nd isotope spaces, 
however, the samples lie on a linear trend regardless of their locations or 
type of anomalies (Fig. 1), highlighting binary mixing between depleted 
mantle (8°4S = —1.28%o or lower) and the enriched-mantle-type end- 
member (5°“S = +1.05%o or higher). 

Both LOMU basalts (having the lowest 206Pb/?Pb) and Shona 
samples (HIMU-type’, having the highest *°°Pb/?™*Pb) cannot be dis- 
tinguished from the other samples in the observed trends (Fig. 1). This 
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lack of correlation between 5°4S and *°°Pb/?"*Pb (Supplementary 
Fig. 5), together with the preservation of the trends between 8°*S and 
Sr-Nd isotopes, argue in favour of relatively sulphur-poor HIMU and 
LOMU endmembers, overprinted by the contribution ofa sulphur-rich 
enriched-mantle component. In this view, recycled oceanic crust car- 
rying the radiogenic *°°Pb/?™Pb ratio is inferred to be relatively sul- 
phur-depleted, consistent with its required high U/Pb. Sulphides are 
indeed the main Pb carrier in the oceanic crust”! and any Pb loss along 
subduction should occur through a concomitant sulphur loss. 

The mixing trends being linear, the enriched-mantle component 
must be enriched in sulphur in the same way as it is enriched in 
strontium and neodymium. This allows a reliable estimate of its S/Sr 
and S/Nd, broadly equal to that of the depleted mantle. For simplicity, 
only the S/Sr ratio is used in the following. Taking 200 + 40 p.p.m. (ref. 22) 
for sulphur and 11.3 p.p.m. (ref. 15) for strontium in depleted mantle, 
we obtain an S/Sr value for enriched mantle of 17 + 4. Both delami- 
nated lithospheric mantle and lower continental crust have been sug- 
gested to account for enriched-mantle-type signals, especially in south 
Atlantic basalts*”°. These reservoirs are, however, relatively sulphur- 
poor compared to incompatible trace elements, yielding S/Sr ratios of 
3.2 + 2.0 (ref. 23) and 0.8 + 0.3 (ref. 24), respectively, that are low 
compared to the required value of 17 + 4 (see above). Any occurrence 
of such a component in the south Atlantic mantle would have led to 
highly curved mixing relationships in S-Sr isotope space, especially 
given the range of *’Sr/*°Sr displayed by the samples (Fig. 1a). 

Alternatively, sediment may be a good candidate: sediments are 
enriched in trace elements” and can bear significant amounts of sul- 
phur. In particular, many sediments deposited one to two billion years 
ago during the Proterozoic eon formed under reduced conditions and 
usually contain more than 1% of sulphur pyrite’®. A subducted sedi- 
ment containing a realistic sulphur content of 5,700 + 1,000 p.p.m. 
would satisfy the S/Sr of the enriched-mantle component. The fact 
that our estimate for recycled sediment falls within large uncertainties 
of non-subducted sediments suggests only moderate devolatilization, 
if any, during recycling (Supplementary Information). The best fits to 
our data are obtained for sediment having a 3°45 of +10 + 3%o, con- 
sistent with sediments one to two billion years old with an average 5°*S 
of +5 + 10%bo (ref. 27). This Proterozoic age for subducted sediments 
is also consistent with the observed lack of A*’S variability”® best-fitted 
with a A*’S of +0.06 + 0.03%o (Supplementary Fig. 6). 

If subducted sediments are major *’S carriers to the deep mantle, the 
depleted mantle is shown here to be significantly **S-depleted with 
respect to chondrites. This low *“S/°’S ratio has fundamental implica- 
tions for our understanding of the origin and processes affecting mod- 
erately volatile elements in the terrestrial mantle. In the following, we 
use the — 1.28 + 0.33%o depleted MORB average as the representative 
8°45 of the depleted mantle, but further studies are needed to refine 
this value. Such a negative estimate allows us to discard a purely late- 
veneer origin for sulphur in the mantle, because this scenario requires a 
strictly chondritic **S/*’S ratio for depleted mantle. According to the 
kinetic theory of gases, partial evaporation of sulphur during accretion 
could have modified the *4S/*7S ratio of the accreted Earth, leaving the 
residual terrestrial mantle enriched in *’S relative to chondrites. This 
would be at odds with the observed depletion in 34. Finally, assuming 
that the depleted mantle represents 25%-80% of the whole mantle”, 
the complementary surface reservoirs (continents, oceans and altered 
oceanic crust) would need to have an average 5°“S of between +8%o 
and +25%o to balance the non-chondritic 5°“S of the depleted mantle. 
Current estimates, however, yield average 3°4S values of —0.4 + 3.0%o0 
(ref. 30) for the bulk surface reservoirs, a value that is far from being 
reconcilable with the above requirement, and emphasizing the non- 
chondritic character of the bulk silicate Earth. 

In contrast to immiscible sulphide exsolution, sulphur dissolution into 
metal from silicate could fractionate the **S/’S ratio owing to the dis- 
tinct molecular environment of sulphur in a silicate versus a metallic 
liquid (and distinct vibrational partition functions). One likely possibility 
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Figure 2 | Core-mantle sulphur partitioning model in batch equilibrium or 
open system. The 5°*4S of the mantle is plotted against the sulphur content of 
the mantle S,,antie divided by the sulphur content of the bulk Earth Syunc-zarth: 
According to ref. 1, Sante represents less than 3% of Spun-zarth: The sulphur 
isotope composition of chondrites (refs 3 and 4) and mantle (this study) values 
are both plotted, allowing a clear visualization of the sulphur isotope shift 
between these two reservoirs. Sulphur isotope and sulphur abundance 
observations coincide at an anchor point constraining the core-mantle 
fractionation. Under batch equilibrium, core 3°45 is +1.26%o higher than the 
silicate value (&core-mantle = 1.00130). In an open system model, the core is 
treated as a cumulative product and o6re-mantle Of 1.00035 is sufficient to 
explain the shift between mantle and chondrites. 


involves the dissolution of sulphur in its metallic form (S°) in the core, 
given that stable isotope fractionation theory predicts the more oxidized 
compound to be *“S-enriched compared to sulphur dissolved as S*~ in 
the silicate mantle. The o%.o:¢—mantle Can be inferred from a chondritic bulk 
Earth (that is, mantle and core) by combining the distribution of sulphur 
in the bulk Earth (ref. 1) and the re-evaluated **S/*’S of the mantle (this 
study), leading to a Ocore—mantle Of 1.00130 for a batch-equilibrium or 
1.00035 for an open-system sulphur incorporation into the core 
(Fig. 2). In all cases, the core 54S is expected to be +0.07%o, almost 
indistinguishable from that of chondrites and iron meteorites, because 
this reservoir would contain most of the terrestrial sulphur’. 

Only Shahar et al.” experimentally addressed the **S/°7S fractiona- 
tion between metal and silicate at high temperature, and estimated a 
Ocore—mantle Of 1.00220 + 0.0014%o0 at 1,850 °C (ref. 2). For 97% of the 
sulphur occurring in the core’, such an estimate should lead to a mantle 
84S of —2.10 + 1.40%, under batch-equilibrium fractionation. This 
value, despite a significant overlap, seems slightly lower than our obser- 
vation. Because stable isotope fractionations scale with temperature T as 
1/T’, such distinct estimates could reflect a core segregation occurring at 
a temperature higher than 1,850 °C: the average temperature for core- 
mantle segregation remains highly debated, with values up to 3,300 °C 
(ref. 31). Alternatively, such a distinction may require the involvement 
of ‘hybrid models’, as suggested for highly siderophile elements”, in 
which a fraction of the mantle sulphur is a fractionated leftover from 
core-mantle equilibrium, whereas the other fraction would have been 
delivered during the late-accretionary stage. For a Ocore—mantle Of 
1.00220 (ref. 2), the proportion of late-accreted sulphur (that is, with 
chondritic **S/°’S) in the mantle would reach approximately 40% of 
mantle sulphur, after batch-equilibrium core segregation. 

At that stage, the most obvious implication of such a core-mantle 
differentiation record is that sulphur and elements of comparable vola- 
tilities (such as zinc, fluorine and lead) must have been present to some 
extent before the late-accretion event. This is a robust and independent 
constraint that will help to build more consistent models of timing for 
the origin of moderately volatile elements in Earth’s mantle. 


METHODS SUMMARY 


Samples were analysed following the protocol described in ref. 7. Glassy rims of 
basalts were handpicked under a binocular microscope and if needed, cleaned 
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through a sonication in 99.9% ethanol. For each sample, about 300 mg was crushed 
to a fine powder (<100 um) and dissolved in a 29 N HF + 2.1 M CrCl, solution. 
Under these conditions, sulphur is released as HS and subsequently trapped as 
precipitated Ag,S. Sulphur extraction yields are 101 + 4% for the 23 samples 
analysed in this study. 

Weighed aliquots of silver sulphide were wrapped in aluminium foil and placed 
in Ni-reaction bombs for fluorination with purified F, at 250 °C overnight. The SF, 
produced was then purified using both cryogenic separation and gas chromato- 
graphy. The purified SF; was then quantified and analysed using a dual-inlet 
ThermoFinnigan MAT 253 mass spectrometer where m/z = 127°, 128°, 129* 
and 131* ion beams are monitored. All the samples were replicated, yielding a 
+0.1, £0.005 and +0.1%o uncertainty for 8°45, A*°S and A*’S, respectively, con- 
sistent with the basalt internal standards processed in ref. 7. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Sample selection and preparation. Studied samples were chosen according to 
their degree of glassiness. They were cleaned in 99.9% ethanol to remove any room 
dust from the samples. Phenocrysts were removed under a binocular microscope 
or by magnetic separation, when needed. Samples were then crushed to a grain size 
smaller than 63 Lm. 

Electron microprobe measurement. Sulphur contents of MORB glasses were 
determined by electron micro-probe (EMP) analyses on polished sections with 
a Cameca SX100 at the CAMPARIS facility (Pierre et Marie Curie University). The 
analytical conditions used were 15 kV accelerating voltage, 100 nA sample current, 
20 um beam size, and 60s counting time for each point. Ten spots were analysed 
on each polished section, the calculated mean value being taken as the sulphur 
content of the sample and the standard deviation of each series of measurements 
being taken as the 1o uncertainty (around 25 p.p.m.). 

A natural pyrite was used as a standard. Results were additionally calibrated 
using two reference samples: sample JDF D2 (high-sulphur-content standard, with 
1,400 p.p.m. sulphur) and ED DR11 1-9 (low-sulphur-content standard, with 
731 p.p.m. sulphur). These two standards were analysed alternately every 3 or 4 
samples. For chlorine contents (see Supplementary Information), the analytical 
conditions used were 25 kV accelerating voltage, 500nA sample current, 20 1m 
beam size, 100s counting time for each point. A natural scapolite was used as a 
standard. In addition, results were systematically calibrated using one reference 
sample: the south Atlantic ridge sample EW9309 41D-1, for which the chlorine 
amount has been chemically determined at 55 + 12 p.p.m. (ref. 32). Ten spots were 
analysed on each polished section. The calculated mean value was taken as the 
chlorine content of the sample and the standard deviation of each series of mea- 
surements is taken as the 1o uncertainty (10 p.p.m.). 

Sulphur extraction for isotope determination. Sulphur isotope measurement of 
sulphide inclusions in 300-400 mg of glass separates were performed after chemical 
extraction, by gas source isotope ratio mass spectrometry at the Stable Isotope 


Laboratory at the Institut de Physique du Globe de Paris. Powdered samples were 
transferred to a Teflon apparatus like that in ref. 7, where they underwent sub-boiling 
HF-Cr(1) sulphur extraction. Sulphur released in this process as H2S was trapped as 
silver sulphide, which was washed, dried and wrapped in clean Al foil. The sulphur- 
content measurement performed via the EMP value represents the bulk sulphur 
content of the sample, compared to our chemical extraction protocol, which is 
strictly specific to reduced sulphur (see details in ref. 7). Chemical extraction yields 
(that is, the ratio of chemically extracted reduced sulphur to sulphur determined by 
EMP) averaged 101 + 4% (10, n = 23). Such a good match with EMP data supports 
the absence of significant amounts of oxidized sulphur in these basalts, in agreement 
with synchrotron spectroscopic results in worldwide samples'*. 

The sulphur isotope measurements were then performed using a dual-inlet 
MAT 253 gas-source mass spectrometer. Weighed silver sulphide wrapped in 
aluminium foil was placed into Ni-reaction bombs for fluorination with 250 torr 
of purified F, at 250°C overnight. The produced SF, was then purified from 
impurities in a vacuum line using cryogenic methods and subsequently by gas 
chromatography separation). Purified SF, was then quantified and analysed in the 
mass spectrometer, where m/z = 127+, 128+, 129+ and 131+ ion beams are 
monitored. The quality of the measurements was estimated on the basis of long- 
term reproducibility for IAEA reference materials. Repeated analyses gave 5°*S = 
—0.29 + 0.04%, A**S = +0.082 + 0.004%o, A°°S = —0.91 + 0.11%o for IAEA S1 
(all errors are 1a, n = 43) and 8°*S = +22.33 + 0.06%o, A*°S = +0.030 + 0.006%o, 
A*’S = —0.17 + 0.07%o for LAEA 82 (all errors are 1a, n = 20). These values are in 
agreement with data reported by other laboratories worldwide. 

For each MORB glass sample, the sulphur isotope measurement was duplicated 
and the standard deviation between the two duplicates was taken as the 1a exter- 
nal uncertainty. The standard 5°4S uncertainty is 0.01%o-0.15%o, whereas it is 
approximately 0.010%o and 0.100%o for A**S and A*’S, respectively. 


32. Bonifacie, M. et al. The chlorine isotope composition of Earth’s mantle. Science 
319, 1518-1520 (2008). 


©2013 Macmillan Publishers Limited. All rights reserved 


| sid ial Be 


doi:10.1038/nature12443 


Computational design of ligand-binding proteins 
with high affinity and selectivity 
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The ability to design proteins with high affinity and selectivity for 
any given small molecule is a rigorous test of our understanding of 
the physiochemical principles that govern molecular recognition. 
Attempts to rationally design ligand-binding proteins have met with 
little success, however, and the computational design of protein- 
small-molecule interfaces remains an unsolved problem’. Current 
approaches for designing ligand-binding proteins for medical’ and 
biotechnological uses rely on raising antibodies against a target 
antigen in immunized animals** and/or performing laboratory- 
directed evolution of proteins with an existing low affinity for the 
desired ligand*’, neither of which allows complete control over the 
interactions involved in binding. Here we describe a general com- 
putational method for designing pre-organized and shape comple- 
mentary small-molecule-binding sites, and use it to generate protein 
binders to the steroid digoxigenin (DIG). Of seventeen experimentally 
characterized designs, two bind DIG; the model of the higher affinity 
binder has the most energetically favourable and pre-organized inter- 
face in the design set. A comprehensive binding-fitness landscape of 
this design, generated by library selections and deep sequencing, was 
used to optimize its binding affinity to a picomolar level, and X-ray 
co-crystal structures of two variants show atomic-level agreement with 
the corresponding computational models. The optimized binder is 
selective for DIG over the related steroids digitoxigenin, progesterone 
and f-oestradiol, and this steroid binding preference can be repro- 
grammed by manipulation of explicitly designed hydrogen-bonding 
interactions. The computational design method presented here should 
enable the development of a new generation of biosensors, therapeutics 
and diagnostics. 

Computational design could provide a general approach for creating 
new small molecule binding proteins with rationally programmed 
specificities. Structural and biophysical characterization of previous 
computationally designed ligand-binding proteins revealed numerous 
discrepancies with the design models, however, and it was concluded 
that protein-ligand interaction design is an unsolved problem’*. The 
lack of accuracy in programming protein—small-molecule interactions 
also contributes to low catalytic efficiencies of computationally designed 
enzymes” '*. The development of robust computational methods for the 
design of small-molecule-binding proteins with high affinity and select- 
ivity would have wide-ranging applications. 

We developed a computational method for designing ligand-binding 
proteins with three properties characteristic of naturally occurring 
binding sites: (1) specific energetically favourable hydrogen-bonding 
and van der Waals interactions with the ligand; (2) high overall shape 
complementarity to the ligand; and (3) structural pre-organization in 
the unbound protein state, which minimizes entropy loss upon ligand 
binding’*’*. To program in defined interactions with the small molecule, 


disembodied binding sites are created by positioning amino acid side 
chains around the ligand in optimal orientations and then placed at 
geometrically compatible sites in a set of scaffold protein structures’. 
The surrounding side chain identities and conformations are then opti- 
mized to generate additional protein-ligand and buttressing protein- 
protein interactions (Fig. la). Designs with protein—small-molecule 
shape complementarity below those typical of native complexes’* or 
having interface side chain conformations with low Boltzmann-weighted 
probabilities in the unbound state’® are then discarded. 

We used the method to design proteins that bind the steroid DIG 
(Supplementary Fig. 1), the aglycone of digoxin, a cardiac glycoside used 
to treat heart disease’? and a non-radioactive biomolecular labelling 
reagent”. Anti-digoxigenin antibodies are administered to treat overdoses 
of digoxin, which has a narrow therapeutic window”, and are used to 
detect biomolecules in applications such as fluorescence in situ hybridiza- 
tion”. We created idealized DIG-binding sites featuring hydrogen 
bonds from Tyr or His to the polar groups of DIG and hydrophobic 
packing interactions between Tyr, Phe or Trp and the steroid ring 
system (Fig. 1a). These interactions were embedded in designed binding 
sites with high shape complementarity to DIG, and 17 designs were 
selected for experimental characterization based on computed binding 
affinity, shape complementarity, and the extent of binding site pre- 
organization in the unbound state (Fig. 1b and Supplementary Tables 
land 2). 

Binding of the designed proteins to DIG was probed by yeast 
surface display” and flow cytometry using DIG-functionalized bovine 
serum albumin (DIG-BSA) or RNase (DIG-RNase). Designed proteins 
DIG5 and DIG10 bound to both labels (Fig. 1c and Supplementary 
Fig. 2), and binding was reduced to background levels when unlabelled 
DIG was added as a competitor (Fig. 1c and Supplementary Fig. 3). 
Fluorescence polarization measurements with purified proteins and 
Alexa488-fluorophore-conjugated DIG (DIG-PEG3-Alexa488) indi- 
cated affinities in the low-to-mid micromolar range, with DIG10 bind- 
ing more tightly (Fig. 2a, b). Isothermal titration calorimetry (ITC) 
measurements confirmed that the affinity of DIG10 for DIG is identical 
to that for DIG-PEG3-Alexa488 (Fig. 2b, Supplementary Fig. 4 and 
Supplementary Table 3). The scaffold from which both DIGS and 
DIG10 derive, a protein of unknown function from Pseudomonas 
aeruginosa (Protein Data Bank (PDB) accession code 1Z1S), does not 
bind to either label (Fig. 1c and Supplementary Fig. 3a) when expressed 
on the yeast surface or to DIG-PEG;-Alexa488 in solution (Fig. 2a), 
suggesting that the binding activities of both proteins are mediated 
by the computationally designed interfaces. Indeed, substitution of 
small nonpolar residues in the binding pockets of DIGS and DIG10 
with arginines resulted in complete loss of binding, and mutation of 
the designed hydrogen-bonding tyrosine and histidine residues to 
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Figure 1 | Computational design methodology and experimental binding 
validation. a, Overview of the design procedure. b, Ranking of experimentally 
characterized DIG designs by computed ligand interaction energy (Rosetta 
energy units, REU) and average (geometric mean) side-chain Boltzmann 
weight of residues designed to hydrogen bond to DIG. DIG10 (red) scores the 
best by both metrics. c, Flow cytometric analysis of yeast cells expressing 
designed proteins. Yeast surface expression and DIG binding were probed by 


phenylalanine reduced (DIGS) or abolished (DIG10) binding (Fig. 1d 
and Supplementary Fig. 5). Optimization of DIG1O by site-saturation 
mutagenesis and selections using yeast surface display and fluorescence- 
activated cell sorting (FACS) identified several small-to-large hydro- 
phobic amino acid changes that increase binding affinity 75-fold through 
enhanced binding enthalpy, yielding DIG10.1 (Fig. 2b, c, f, Supplemen- 
tary Figs 4, 6—8 and Supplementary Table 3). 

To provide feedback for improving the overall design methodology 
and to evaluate the contribution of each residue in the DIG10.1-binding 
site, we used next-generation sequencing to generate a comprehensive 
binding fitness map****. A library of variants with approximately 1-3 
substitutions at 39 designed interface positions in DIG10.1 was gener- 
ated using doped oligonucleotide mutagenesis, displayed on yeast, and 
subjected to selections using a monovalent DIG-PEG;-biotin conjugate 
(Supplementary Fig. 9). Variants with increased affinity for DIG were 
selected by FACS, and next-generation sequencing was used to quantify 
the frequency of every single point mutation in the unselected and 
selected populations. A large majority of the interrogated variants were 
depleted in the selected population relative to the unselected input library, 
suggesting that most of the DIG10.1-binding site residues are optimal for 
binding (Fig. 2d, e and Supplementary Fig. 10). In particular, mutation of 
the three designed hydrogen-bonding residues, Tyr 34, Tyr 101 and 
Tyr 115, to any other amino acid was disfavoured. Several large hydro- 
phobic residues that pack against the ligand in the computational model 


VI17 gv ¥34 


labelling with anti-c-Myc-fluorescein (FITC) and a mixture of biotinylated 
DIG-functionalized BSA and phycoerythrin (PE)-streptavidin, respectively. 
ZZ(—), negative control; ZZ(+), positive control; 1Z1S, original scaffold. 

d, On-yeast substitutions of DIG10-designed interface residues reduce binding 
(phycoerythrin) signals to background negative control levels. See figure 
legends in Supplementary Information for details. 


are also functionally optimal (for example, Phe66 and Phe 119). 
Besides Ala99, which contacts DIG directly, most of the observed 
mutations that improve binding are located in the second coordination 
shell of the ligand and fall into two categories: (1) protein core sub- 
stitutions tolerating mutation to chemically similar amino acids (for 
example, Leu 105 and Cys 23), and (2) solvent-exposed loop residues 
having high sequence entropy (for example, His 90 and Val 92). The 
best clone obtained from sorting the library to homogeneity, DIG10.2, 
contains two of the most highly enriched mutations, Ala37Pro and 
His41Tyr (Fig. 2b, c and Supplementary Figs 4, 6, 8 and 11). 

To increase binding affinity further, we constructed a library in 
which the residues at 11 positions that acquired beneficial substitu- 
tions in the deep sequencing experiment were varied in combination to 
allow for non-additive effects. Selections led to DIG10.3 (Supplemen- 
tary Figs 4, 6, 8 and 12), which binds DIG and its cardiac glycoside 
derivative digoxin with picomolar affinity (Fig. 2b, Supplementary 
Fig. 13 and Supplementary Table 4), rivalling the affinities of anti- 
digoxin antibody therapeutics” and an evolved single-chain variable 
anti-digoxin antibody fragment’. Fluorescence-polarization-based affi- 
nity measurements of DIG10.3 and Tyr knockouts suggest that the 
designed hydrogen bonds each contribute ~2 kcal mol | to binding 
energy (Supplementary Table 5 and Supplementary Fig. 8). 

The crystal structures of DIG10.2 and DIG10.3 in complex with DIG 
were solved to 2.05 A and 3.2 A resolution, respectively (Fig. 3a, b and 
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Figure 2 | Binding characterization and affinity maturation. a, Equilibrium 
fluorescence anisotropy of DIG-PEG;-Alexa488 mixed with purified DIG10 
(blue), DIGS (cyan), 1Z1S scaffold (black) and BSA (red). Error bars represent 
s.d. for at least three independent measurements. b, Dissociation constants 
(Kg values) of designs. Differences in fluorescence polarization (FP)- and 
isothermal titration calorimetry (ITC)-derived Ky values probably result from 
enhanced interactions with the linker of DIG-PEG3-Alexa488 used in the 
fluorescence polarization experiments. ND, not determined. c, Mutations 
identified during affinity maturation to generate DIG10.1 (blue), DIG10.2 
(orange) and DIG10.3 (green) mapped onto the computational model of 


Supplementary Figs 14—19). The structure of DIG10.2 bound to DIG 
shows atomic-level agreement with the design model (all-atom root 
mean squared deviation (r.m.s.d.) = 0.54 A; Fig. 3c). The ligand-protein 
interface has high shape complementarity (S, = 0.66) and no water 
molecules are observed in the binding pocket. The DIG binding mode 
is nearly identical in the X-ray structure and the computational model, 

with an average r.m.s.d. of 0.99 A for all ligand heavy atoms (Fig. 3d). 

As designed, Tyr 34, Tyr 101 and Tyr 115 hydrogen bond with 03, O2 
and O1 of DIG, respectively. Tyr 41, a residue identified during affinity 
maturation, forms a weak hydrogen bond with the terminal hydroxyl 
group of DIG (O5) (Supplementary Fig. 16). Of27 non-glycine/alanine 
residues within ~10A of the ligand, 21 adopt the computationally 
designed conformations (Supplementary Fig. 17), including Tyr 101 
and Tyr 115 (in chain B) as well as the first-shell packing residues 
Trp 22, Phe 58 and Phe 119. The structure of DIG10.3 bound to DIG 
(Supplementary Fig. 18) also agrees closely with the design model 
(r.m.s.d. = 0.68 A). 

We assessed the binding specificity of DIG10.3 by determining affi- 
nities for a series of related steroids by equilibrium competition fluor- 
escence polarization. Digitoxigenin, progesterone and f-oestradiol bind 
less tightly to DIG10.3 than DIG (Fig. 4a, b and Supplementary Table 4). 
The magnitudes of the affinity decreases are consistent with the loss of one, 
two and three hydrogen bonds, respectively (assuming ~1.8 kcal mol 
per hydrogen bond”*), suggesting that these compounds bind in the same 
orientation as DIG. We next investigated whether the observed steroid 
selectivity could be reprogrammed by mutagenesis of the key hydrogen- 
bonding tyrosines. The variants Tyr101Phe, Tyr34Phe and Tyr34Phe/ 
Tyr99Phe/Tyr101Phe show clear preferences for more hydrophobic 
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DIG10.3. d, Fitness landscape of DIG10.1 showing the effects of single amino 
acid substitutions on binding (AE*; see equation (1) in Methods). Red and 
blue indicate enrichment and depletion, respectively. The original DIG10.1 
amino acid at each position is indicated in bold. White indicates mutations for 
which there were not enough sequences in the unselected library to make a 
statistically significant conclusion about function. e, The optimality of each 
initial DIG10.1 residue type mapped onto the computational model of 
DIG10.1. f, DIG binding thermodynamic parameters determined by ITC. 
AG, free binding energy; AH, binding enthalpy; — TAS, binding entropy. 

See figure legends in Supplementary Information for details. 


steroids in a predictable manner that depends on the hydrogen-bonding 
capabilities of both the protein and the steroid. Tyr101Phe eliminates the 
DIG-specific hydrogen bond with DIG O2 and provides a more hydro- 
phobic environment that favours the other three steroids (Fig. 4c). 
Tyr34Phe removes a hydrogen bond common to DIG and digitoxigenin, 
thus enhancing the preference for progesterone (Fig. 4d). Tyr34Phe/ 
Tyr99Phe/Tyr101Phe has decreased affinity for DIG and increased affi- 
nity for the more hydrophobic steroids (Fig. 4e). These results confirm 
that the selectivity of DIG10.3 for DIG is conferred through the designed 
hydrogen-bonding interactions and demonstrate how this feature can be 
programmed using positive design alone through the explicit placement 
of designed polar and hydrophobic interactions. 

Comparison of the properties of successful and unsuccessful designs 
provides a test of the hypotheses underlying the design methodology. 
Although all 17 designed proteins had high computed shape comple- 
mentarity to DIG by construction, the DIG10 design, which had the 
highest affinity for DIG, had the most favourable computed protein- 
ligand interaction energy and was predicted to have the most pre- 
organized binding site (Fig. 1b and Supplementary Table 6), suggesting 
that these attributes should continue to be the focus of future design 
methodology development. One potential avenue for obtaining more 
favourable interaction energies would be incorporating backbone flex- 
ibility during design to achieve more tightly packed binding sites: the 
fact that substitution of small hydrophobic interface residues to larger 
ones increased binding affinity indicates that the original DIG10 
design was under-packed. 

The binding fitness landscape in combination with the X-ray co- 
crystal structures highlight the importance of second shell interactions 
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Figure 3 | Crystal structures of DIG10.2-DIG and DIG10.3-DIG. a, Surface 
representation of the DIG10.2—DIG complex. DIG is shown in magenta. 
DIG10.2 is a dimer and crystallized with four copies in the asymmetric unit. 
b, DIG10.2—DIG 2F, — F, omit map electron density showing DIG and 
interacting tyrosines contoured at 1.0c. c, Backbone superposition of the crystal 
structure of DIG10.2—DIG (magenta) with the computational model (grey). 
d, DIG10.2—DIG-binding site backbone superposition. e, Comparison of an 
overlay of the four DIG10.2-DIG crystallographic copies (left) with that of 
DIG10.3-DIG chains A, B, C, H and I (right; the density was poorly resolved for 
the other four chains). DIG10.2 Tyr 115 conformation A has a more canonical 
hydrogen-bonding geometry than conformation B, and in DIG10.3 Tyr 115 is 
locked into conformation A by Trp 105. 


in stabilizing binding competent conformations. For example, the 
enriched substitution Leul05Trp (Fig. 2d) causes the adjacent Tyr 115 
side chain, which shows conformational variability in DIG10.2, to adopt 
a single conformation in DIG10.3 that makes a more canonical hydro- 
gen bond to DIG than that of the DIG10.2 alternative state (Fig. 3e and 
Supplementary Fig. 19). The calculated side chain pre-organization ofall 
three hydrogen-bonding tyrosine residues increases from DIG10.2 to 
DIG10.3 (Supplementary Table 7), suggesting that the increased affinity 
may arise in part from a higher proportion of the binding competent 
conformation of apo-DIG10.3 (refs 15, 27). Indeed, ITC studies con- 
firm that entropic as well as enthalpic factors contribute to the enhanced 
binding affinity of DIG10.3 (Fig. 2fand Supplementary Table 3). Similarly, 
reduced backbone conformational entropy is probably responsible for 
the increased fitness of substitutions increasing B-sheet propensity at 
inter-strand loop positions 90 and 92 (Fig. 2d). That flexibility is selected 
against during affinity maturation suggests that maximizing the free- 
energy gap between binding-competent and alternative states of the bind- 
ing site** by explicitly designing second shell interactions to buttress side 
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Figure 4 | Steroid binding selectivity. a, X-ray crystal structure of the 
DIG10.3—DIG complex (left) and chemical structures of related steroids 
(right). b, Steroid selectivity profile of DIG10.3 determined by measuring the 
equilibrium effects of unlabelled steroids on the anisotropy of the DIG-PEG;- 
Alexa488—DIG10.3 complex. Solid lines represent fits of the data to a 
competitive binding model and error bars indicate s.d. for at least three 
independent measurements. Inhibition constants (K; values, coloured 
numbers) were estimated from inhibitory half-maximal concentrations 

(ICs9 values) obtained from the fits. c, Steroid selectivity profile of 
DIG10.3(Tyr101Phe). Dashed lines show qualitative assessments of the 
inhibitory effects for cases in which the data could not be fit due to overly strong 
inhibition (see Supplementary Methods). d, Steroid selectivity profile of 
DIG10.3(Tyr34Phe). e, Steroid selectivity profile of DIG10.3(Tyr34Phe/ 
Tyr99Phe/Tyr101Phe). 


chains making key ligand contacts should help to achieve high affinity in 
the next generation of computationally designed ligand-binding proteins. 

The binding affinity of DIG10.3 is similar to those of anti-digoxin 
antibodies”’, and because it is stable for extended periods (>3 months) 
at ambient temperatures (Supplementary Fig. 20) and can be expressed 
at high levels in bacteria, it could provide a more cost-effective alterna- 
tive for biotechnological and for therapeutic purposes if it can be made 
compatible with the human immune response. With continued 
improvement in the methodology and feedback from experimental 
results, computational protein design should provide an increasingly 
powerful approach to creating small molecule receptors for synthetic 
biology, therapeutic scavengers for toxic compounds, and robust bind- 
ing domains for diagnostic devices. 


METHODS SUMMARY 


Design calculations were performed using RosettaMatch”’ to incorporate five pre- 
defined interactions to DIG into a set of 401 scaffolds. RosettaDesign” was then 
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used to optimize each binding site sequence for maximal ligand-binding affinity. 
Designs having low interface energy, high shape complementarity, and high bind- 
ing site pre-organization were selected for experimental characterization. Example 
command lines and full design protocols are given in the Supplementary Data. 

Designs were displayed on the surface of yeast strain EBY100 and examined for 
binding to a mixture of 2.7 |.M biotinylated DIG-conjugated BSA or DIG-conjugated 
RNase and streptavidin-phycoerythrin on an Accuri C6 flow cytometer. Binding 
clones from yeast-surface displayed libraries based on DIG10 were selected using 
highly avid DIG-BSA or DIG-RNase or monovalent DIG-conjugated biotin on a 
Cytopeia inFlux cell sorter. DIG10.1-derived library DNA was sequenced in paired- 
end mode on an Illumina MiSeq. For single mutations having =7 counts in the 
original input library, a relative enrichment ratio between the input library and each 
selected library was calculated”**°. The effect of each amino acid substitution on 
binding, AE;, was computed with equation (1), 


0:86 orig,sel 
. f sel f 8 
AE; =log, (Se log, orig,unsel (1) 


in which f°“ is the frequency of mutation x at position i in the selected population, 


orig,sel 


ff"? is the frequency in the unselected population, f is the frequency of the 


original amino acid at position i in the selected population, and fae is the 


frequency of the original amino acid in the unselected population. 

For biochemical assays, proteins were expressed in E. coli Rosetta 2 (DE3) cells 
with a carboxy-terminal tobacco etch virus (TEV) protease-cleavable His, tag. For 
crystallographic analysis of DIG10 variants, a 12-amino-acid structurally disor- 
dered C terminus deriving from the scaffold protein 1Z1S was replaced directly 
with a His, tag. Binding affinities were determined by equilibrium fluorescence 
polarization*’ on a SpectraMax M5e microplate reader by monitoring the anisotropy 
of DIG-conjugated Alexa488 as a function of protein concentration. Equilibrium 
fluorescence polarization competition assays were performed by examining the 
effect of increasing concentrations of unlabelled DIG, digitoxigenin, progesterone 
and f-oestradiol on the anisotropy of designed protein—DIG-conjugated Alexa488 
complex. ITC studies were performed on an iTC200 microcalorimeter. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


General methods. Full details for all computational and experimental methods 
are given in Supplementary Methods. Design calculations were performed using 
RosettaDesign”’. Dissociation constants (K, values) of designs were determined by 
equilibrium fluorescence polarization” and by ITC. Example command lines and 
RosettaScripts*’ design protocols are provided in Supplementary Data. Source 
code is freely available to academic users through the Rosetta Commons agree- 
ment (http://www.rosettacommons.org/). Design models, the scaffold library, and 
scripts for running design calculations are provided on the Baker laboratory 
website. 

Matching. RosettaMatch”’ was used to identify backbone constellations in 401 
protein scaffold structures where a DIG molecule and side chain conformations 
interacting with DIG in a pre-defined geometry could be accommodated. This set 
contained scaffolds previously used for design projects within our laboratory'***, 
as well as structural homologues of a subset of these scaffolds that are known to 
tolerate mutations. Full details are given in Supplementary Methods. 

Rosetta sequence design. Two successive rounds of sequence design were used. 
The purpose of the first was to maximize binding affinity for the ligand**. The goal 
of the second was to minimize protein destabilization due to aggressive scaffold 
mutagenesis while maintaining the binding interface designed during the first 
round. During the latter round, ligand-protein interactions were up-weighted 
bya factor of 1.5 relative to intra-protein interactions to ensure that binding energy 
was preserved. Two different criteria were used to minimize protein destabiliza- 
tion: (1) native scaffold residues identities were favoured by 1.5 REU, and (2) no 
more than five residues were allowed to change from residue types observed in a 
multiple sequence alignment (MSA) of the scaffold if (a) these residues were 
present in the MSA with a frequency greater than 0.6, or (b) if the calculated 
AAG for mutation of the scaffold residue to alanine’ was greater than 1.5 REU 
in the context of the scaffold sequence. In some design calculations, identities of 
the matched hydrogen-bonding residues were allowed to vary according to the 
MSA and AAG criteria described above. In these cases, designs having fewer than 
three hydrogen bonds between the protein and the ligand were discarded. 
Design evaluation. Designs were evaluated on interface energy, ligand solvent 
exposed surface area, ligand orientation, shape complementarity, and apo-protein 
binding site pre-organization. The latter was enforced by explicitly introducing 
second-shell amino acids that bolster the programmed interacting residues using 
Foldit** and by selecting designs having rotamer Boltzmann probabilities'® > 0.1 
for at least one hydrogen-bonding residue (Supplementary Table 6). High shape 
complementarity was enforced by rejecting designs having S- < 0.5. Shape com- 
plimentary was computed using the CCP4 package v.6.0.2 (ref. 36) using the S- 
program'® and the Rosetta radii library. All designs were evaluated for local 
sequence secondary structure compatibility, and those predicted to have backbone 
conformations that varied by >0.8 A from their native scaffold were rejected (see 
Supplementary Methods). 

General experimental methods. Detailed procedures for the syntheses of DIG- 
BSA-biotin, DIG-RNase-biotin, DIG-PEG;-biotin, and DIG-PEG3-Alexa488, as 
well as protein expression, purification, crystallization, cloning and mutagenesis 
methods are given in Supplementary Methods. Details about fluorescence polar- 
ization binding assays, ITC, gel filtration analysis, analytical ultracentrifugation 
experiments, and circular dichroism protein stability measurements are also pro- 
vided in Supplementary Methods. 

Yeast surface display. Designed proteins were tested for binding using yeast- 
surface display”. Yeast surface protein expression was monitored by binding of 
anti-c-Myc-FITC to the C-terminal Myc epitope tag of the displayed protein. DIG 
binding was assessed by quantifying the phycoerythrin (PE) fluorescence of the 
displaying yeast population following incubation with DIG-BSA-biotin, DIG- 
RNase-biotin, or DIG-PEG;-biotin, and streptavidin-phycoerythrin (SAPE). In 
a typical experiment using DIG-BSA-biotin or DIG-RNase-biotin, cells were 
resuspended in a premixed solution of PBSF (PBS plus 1 gL” ' of BSA) containing 
a 1:100 dilution of anti-c-Myc-FITC, 2.66 uM DIG-BSA-biotin or DIG-RNase- 
biotin, and 664nM SAPE for 2-4 h at 4 °C. Cellular fluorescence was monitored 
onan Accuri Cé flow cytometer using a 488-nm laser for excitation and a 575-nm 
bandpass filter for emission. Phycoerythrin fluorescence was compensated to 
minimize bleed-over contributions from the FITC channel. Competition assays 
with free DIG were performed as above except that between 750 UM and 1.5 mM 
DIG was added to each labelling reaction mixture. Full details are given in 
Supplementary Methods. 

Affinity maturation. Detailed procedures for constructing and selecting all lib- 
raries, including those for deep sequencing, are provided in Supplementary 
Methods. Yeast surface display library selections were conducted on a Cytopeia 
inFlux cell sorter using increasingly stringent conditions. In all labelling reactions 
for selections, care was taken to maintain at least a tenfold molar excess of label to 
cell surface protein. Cell surface protein molarity was estimated by assuming that 
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an attenuance at 600 nm (Doo nm) Of 1.0 = 1 X 10’ cells ml ', and that each cell 
displays 50,000 copies of protein”. For each round of sorting, we sorted at least 10 
times the theoretical library size. FlowJo software v. 7.6 was used to analyse all data. 
Cell sorting parameters and statistics for all selections are given in Supplementary 
Table 9. 

Next-generation sequencing. Two sequencing libraries based on DIG10.1 were 
assembled by recursive PCR: an amino-terminal library (fragment 1 library) and a 
carboxy-terminal library (fragment 2 library). To introduce mutations, we used 
degenerate PAGE-purified oligonucleotides in which the bases coding for 39 
selected binding site amino acid residues were doped with a small amount of each 
non-native base at a level expected to yield 1-2 mutations per gene (TriLink 
BioTechnologies) (Supplementary Table 10). Yeast cells were transformed with 
DNA insert and restriction-digested pETCON”. Surface protein expression was 
induced” and cells were labelled with anti-c-Myc-FITC and sorted for protein 
expression. Expressing cells were recovered, induced, labelled with 100nM of 
DIG-PEG;-biotin for >3h at 4°C and then SAPE and anti-c-Myc-FITC for 
8 min at 4°C, and then sorted. For each library, clones having binding signals 
higher than that of DIG10.1 were collected (Supplementary Fig. 9). To reduce 
noise from the first round of cell sorting, the sorted libraries were recovered, 
induced, and subjected to a second round of sorting using the same conditions 
(see Supplementary Methods). 

Library DNA was prepared as described’. Illumina adaptor sequences and 
unique library barcodes were appended to each library pool by PCR amplification 
using population-specific primers (Supplementary Table 11). DNA was sequenced 
in paired-end mode onan Illumina MiSeq using a 300-cycle reagent kit and custom 
primers (see Supplementary Methods). Of a total of 5,630,105 paired-end reads, 
2,531,653 reads were mapped to library barcodes (Supplementary Table 12). For 
each library, paired-end reads were fused and filtered for quality (Phred = 30). The 
resulting full-length reads were aligned against DIG10.1 using Enrich**. For single 
mutations having =7 counts in the original input library, a relative enrichment 
ratio between the input library and each selected library was calculated’**». 
The effect of each amino acid substitution on binding, AE*, was computed with 


equation (1), 
-x,sel -orig,sel 
AE; log, (&) log, (=) ( 1) 


in which f**" is the frequency of mutation x at position i in the selected population, 


orig,sel 


ff"! is the frequency in the unselected population, f is the frequency of the 


original amino acid at position i in the selected population, and f°" is the 


frequency of the original amino acid in the unselected population. 

Fluorescence polarization binding assays. Fluorescence-polarization-based 
affinity measurements of designs and their evolved variants were performed as 
described” using Alexa488-conjugated DIG (DIG-PEG;-Alexa488). Fluorescence 
anisotropy (r) was measured in 96-well plate format on a SpectraMax M5e micro- 
plate reader (Molecular Devices) with 2.x = 485nM and A. = 538nM using a 
515-nm emission cut-off filter. Fluorescence polarization equilibrium competition 
binding assays were used to determine the binding affinities of DIG10.3 and its 
variants for unlabelled DIG, digitoxigenin, progesterone, B-oestradiol and digoxin. 
The inhibition constant (K;) for each protein-ligand interaction was calculated 
from the measured total unlabelled ligand producing 50% binding signal inhibi- 
tion (Iso; see Supplementary Methods) and the Kg of the protein-label interaction 
according to a model accounting for receptor-depletion conditions”’. Full details 
are provided in Supplementary Methods. 

ITC. ITC studies were performed on an iT'C200 microcalorimeter (MicroCal) at 
25 °C in PBS, pH 7.4. Ligand solutions were prepared by diluting a stock solution 
of DIG (100 mM in 100% dimethylsulphoxide (DMSO)) into the flow-through of 
the last buffer aliquot used to exchange the protein (final DMSO concentrations 
were 1-3%). ITC titration data were integrated and analysed with Origin 7.0 
(MicroCal). Full details are provided in Supplementary Methods. 
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De novo mutations in epileptic encephalopathies 


Epi4K Consortium* & Epilepsy Phenome/Genome Project* 


Epileptic encephalopathies are a devastating group of severe child- 
hood epilepsy disorders for which the cause is often unknown’. Here 
we report a screen for de novo mutations in patients with two clas- 
sical epileptic encephalopathies: infantile spasms (n= 149) and 
Lennox-Gastaut syndrome (n= 115). We sequenced the exomes of 
264 probands, and their parents, and confirmed 329 de novo muta- 
tions. A likelihood analysis showed a significant excess of de novo 
mutations in the ~4,000 genes that are the most intolerant to func- 
tional genetic variation in the human population (P= 2.9 x 10°). 
Among these are GABRB3, with de novo mutations in four patients, 
and ALG13, with the same de novo mutation in two patients; both 
genes show clear statistical evidence of association with epileptic 
encephalopathy. Given the relevant site-specific mutation rates, 
the probabilities of these outcomes occurring by chance are P= 
4.1 x 10~'° and P= 7.8 x 10”, respectively. Other genes with de 
novo mutations in this cohort include CACNA1A, CHD2, FLNA, 
GABRAI, GRINI, GRIN2B, HNRNPU, IQSEC2, MTOR and 
NEDDAL. Finally, we show that the de novo mutations observed 
are enriched in specific gene sets including genes regulated by the 
fragile X protein (P< 10%), as has been reported previously for 
autism spectrum disorders’. 

Genetics is believed to have an important role in many epilepsy 
syndromes; however, specific genes have been discovered in only a 
small proportion of cases. Genome-wide association studies for both 
focal and generalized epilepsies have revealed few significant associa- 
tions, and rare copy number variants explain only a few per cent of 
cases’ °. An emerging paradigm in neuropsychiatric disorders is the 
major impact that de novo mutations have on disease risk’*®. We 
searched for de novo mutations associated with epileptic encephalo- 
pathies, a heterogeneous group of severe epilepsy disorders character- 
ized by early onset of seizures with cognitive and behavioural features 
associated with ongoing epileptic activity. We focused on two ‘classical’ 
forms of epileptic encephalopathies: infantile spasms and Lennox- 
Gastaut syndrome, recognizing that some patients with infantile spasms 
progress to Lennox-Gastaut syndrome. 

Exome sequencing of 264 trios (Methods) identified 439 putative de 
novo mutations. Sanger sequencing confirmed 329 de novo mutations 
(Supplementary Table 2), and the remainder were either false posi- 
tives, a result of B-cell immortalization, or in regions where the Sanger 
assays did not work (Supplementary Table 3). 

Across our 264 trios, we found nine genes with de novo single nuc- 
leotide variant (SNV) mutations in two or more probands (SCNIA, 
n= 7; STXBP1, n = 5; GABRB3, n = 4; CDKL5, n = 3; SCN8A, n = 2; 
SCN2A, n = 2; ALG13, n = 2; DNM1, n = 2; and HDAC4, n = 2). Of 
these, SCNIA, STXBP1, SCN8A, SCN2A and CDKL5 are genes that have 
a previously established association with epileptic encephalopathy”. 
To assess whether the observations in the other genes implicate them as 
risk factors for epileptic encephalopathies, we determined the probabi- 
lity of seeing multiple mutations in the same gene given the sequence- 
specific mutation rate, size of the gene, and the number and gender of 
patients evaluated in this study (Methods). The number of observed de 
novo mutations in HDAC4 and DNM_1 are not yet significantly greater 
than the null expectation. However, observing four unique de novo 
mutations in GABRB3 and two identical de novo mutations in ALG13 


were found to be highly improbable (Table 1 and Fig. 1). We performed 
the same calculations on all of the genes with multiple de novo mutations 
observed in 610 control trios and found no genes with a significant excess 
of de novo mutations (Supplementary Table 4). Although mutations in 
GABRB3 have previously been reported in association with another type 
of epilepsy’*, and in vivo mouse studies suggest that GABRB3 haplo- 
insufficiency is one of the causes of epilepsy in Angelman’s syndrome’, 
our observations implicate it, for the first time, as a single-gene cause of 
epileptic encephalopathies and provide the strongest evidence to date for 
its association with any epilepsy. Likewise, ALG13, an X-linked gene 
encoding a subunit of the uridine diphosphate-N-acetylglucosamine 
transferase, was previously shown to carry a novel de novo mutation in 
a male patient with a severe congenital glycosylation disorder with micro- 
cephaly, seizures and early lethality'’. Furthermore, the same ALG13 
de novo mutation identified in this study was observed as a de novo 
mutation in an additional female patient with severe intellectual disability 
and seizures’®. 

Each trio harboured on average 1.25 confirmed de novo mutations, 
with 181 probands harbouring at least one. Considering only de novo 
SNVs, each trio harboured on average 1.17 de novo mutations (Sup- 
plementary Fig. 1). Seventy-two per cent of the confirmed de novo 
SNV mutations were missense and 7.5% were putative loss-of-function 
(splice donor, splice acceptor, or stop-gain mutations). Compared to rates 
of these classes of mutations previously reported in controls (69.4% mis- 
sense and 4.2% putative loss-of-function mutations)”'*”°, we observed a 
significant excess of loss-of-function mutations in patients with infantile 
spasms and Lennox-Gastaut syndrome (exact binomial P = 0.01), con- 
sistent with data previously reported in autism spectrum disorder™*??”°. 

A framework was recently established for testing whether the dis- 
tribution of de novo mutations in affected individuals differs from the 
general population®. Here, we extend the simulation-based approach 
of ref. 8 by developing a likelihood model that characterizes this effect 
and describes the distribution of de novo mutations among affected 
individuals in terms of the distribution in the general population, anda 
set of parameters describing the genetic architecture of the disease. 
These parameters include the proportion of the exome sequence that 
can carry disease-influencing mutations (1) and the relative risk (y) of 
the mutations (Supplementary Methods). Consistent with what was 
reported in autism spectrum disorder®, we found no significant devi- 
ation in the overall distribution of mutations from expected (y = 1 
and/or 7 = 0). It is, however, now well established that some genes 
tolerate protein-disrupting mutations without apparent adverse 
phenotypic consequences, whereas others do not”. To take this into 
account, we used a simple scoring system that uses polymorphism data 
in the human population to assign a tolerance score to every consid- 
ered gene (Methods). We then found that genes with a known asso- 
ciation with epileptic encephalopathy rank among the most intole- 
rant genes using this scheme (Supplementary Table 8). We therefore 
evaluated the distribution of de novo mutations within these 4,264 
genes that are within the 25th percentile for intolerance and found a 
significant shift from the null distribution (P = 2.9 x 10°). The maxi- 
mum likelihood estimates of 1 (percentage of intolerant genes 
involved in epileptic encephalopathies) was 0.021 and y (relative risk) 
was 81, indicating that there are 90 genes among the intolerant genes 


*A list of authors and affiliations appears at the end of the paper. 
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Table 1 | Probability of observing the reported number of de novo mutations by chance in genes recurrently mutated in this cohort 


Gene Chromosome Average effectively captured Weighted mutation De novo mutation P value 
length (bp) rate number 
SCN1A 2 6,063.70 61 x10~* Sf 112 10-2? eee 
STXBP1 9 1,917.51 6.44 x10°° 5 1.16 x 10714 cre 
GABRB3 15 1,206.86 3.78 x10°° 4 4.11 x 101° is 
CDKL5 Xx 2,798.38 5.44 x10°% 3 4.90 x 1077 ie 
ALG138 Xx 475.05 03 x 10° 2 7.77 x 10-14 gies 
DNM1 9 2323.37 9.10 x10°° 2 2.84 x 104 
HDAC4 Z 2,649.82 16 x10~* 2 4.57 x10-4 
SCN2A8 2 5,831.21 52x10~* 2 1.14 x 1079 igs 
SCN8A 12 5,814.48 64 x 10-4 2 9.14 x10~4 


+ Adjusted « is equivalent to 0.05/18,091 = 2.76 x 10° (*), 0.01/18,091 = 5.53 x 10°” (**) and 0.001/18,091 = 5.53 x 1078 (***). 
£ Counts exclude three additional patients with an indel or splice site mutation as these are not accounted for in the mutability calculation. 
§ Two de novo mutations occur at the same position. The probability of these special cases obtain P= 7.77 xX 10°! and P= 1.14 x 10° for ALG13 and SCN2A, respectively (Methods). 


that can confer risk of epileptic encephalopathies and that each muta- 
tion carries substantial risk. We also found that putatively damaging 
de novo variants in our cohort are significantly enriched in intolerant 
genes compared with control cohorts (Supplementary Methods). 

We next evaluated whether the de novo mutations were drawn pre- 
ferentially from six gene sets (Methods and Supplementary Table 10), 
including ion channels”, genes known to cause monogenic disorders 
with seizures as a phenotypic feature”, genes carrying confirmed de 
novo mutations in patients with autism spectrum disorder**’*” and in 
patients with intellectual disability’*’*, and FMRP-regulated genes. 
Taking into account the size of regions with adequate sequencing cover- 
age to detect a de novo mutation (Methods), we found significant over- 
representation for all gene lists in our data (Supplementary Table 10), 
and no over-representation in controls*!?”°**, 

To determine possible interconnectivity among the genes carrying 
a de novo mutation, we performed a protein-protein interaction ana- 
lysis and identified a single network of 71 connected proteins (Fig. 2 
and Supplementary Fig. 7). These 71 proteins include six encoded by 
OMIM reported epileptic encephalopathy genes (http://www.omim. 
org/) where we identified one or more de novo mutations among the 
epileptic encephalopathy patients in this study. Genes in this protein- 
protein network were also found to have a much greater probability of 
overlap with the autism spectrum disorder***” and severe intellectual 
disability disorder’*** exome sequencing study genes, and with FMRP- 
associated genes, than genes not in this network (Supplementary Table 11). 

In support ofa hypothesis that individual rare mutations in different 
genes may converge on biological pathways, we draw attention to six 
mutations that all affect subunits of the GABA (y-aminobutyric acid) 
ionotropic receptor (four in GABRB3, and one each in GABRAI and 
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Figure 1 | Heat map illustrating the probability of observing the specified 
number of de novo mutations in genes with the specified estimated 
mutation rate. The number of de novo mutations required to achieve 
significance is indicated by the solid red line. The superimposed dots reflect 
positions of all genes found to harbour multiple de novo mutations in our study. 
GABRB3, SCN1A, CDKL5 and STXBP1 have significantly more de novo 
mutations than expected. The positions indicated for ALG13 and SCN2A reflect 
only the fact that there are two mutations observed, not that there are two 
mutations affecting the same site (Methods). 
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GABRB1), and highlight two interactions: HNRNPU interacting with 
HNRNPH1, and NEDD4L (identified here) binding to TNK2, a gene 
previously implicated in epileptic encephalopathies” (Fig. 2). Although 
the HNRNPU mutation observed here is a small insertion/deletion 
variant (indel) in a splice acceptor site, and therefore probably results 
in a modified protein, the HNRNPH1 de novo mutation is synonymous 
and thus of unknown functional significance (Supplementary Table 2). 
Notably, a minigene experiment indicates that this synonymous muta- 
tion induces skipping of exon 12 (Supplementary Methods). 

Evaluation of the clinical phenotypes among patients revealed sig- 
nificant genetic heterogeneity underlying infantile spasms and Lennox- 
Gastaut syndrome, and begins to provide information about the range 
of phenotypes associated with mutations in specific genes (Supplemen- 
tary Table 13). We identified four genes—SCN8A, STXBP1, DNM1 and 
GABRB3—with de novo mutations in both patients with infantile 
spasms and patients with Lennox-Gastaut syndrome. Although infant- 
ile spasms may progress to Lennox-Gastaut syndrome, in three of these 
cases the patients with Lennox—Gastaut syndrome did not initially 
present with infantile spasms, indicating phenotypic heterogeneity 
associated with mutations in these genes yet supporting the notion of 
shared genetic susceptibility. Notably, in multiple patients we identified 
de novo mutations in genes previously implicated in other neurodevelop- 
mental conditions, and in some cases with very distinctive clinical pre- 
sentations (Supplementary Table 12). Most notably, we found a de novo 
mutation in MTOR, a gene recently found to harbour a causal variant 
in mosaic form ina case with hemimegalencephaly”®. Our patient how- 
ever showed no detectable structural brain malformation. Similarly, we 
found one patient with a de novo mutation in DCX and another with a 
de novo mutation in FLNA, previously associated with lissencephaly 
and periventricular nodular heterotopia, respectively*”*; neither patient 
had cortical malformations detected on magnetic resonance imaging. 

In addition to de novo variants, we also screened for highly penetrant 
genotypes by identifying variants that create newly homozygous, com- 
pound heterozygous, or hemizygous genotypes in the probands that are 
not seen in parents or controls (Supplementary Methods). No inherited 
variants showed significant evidence of association. Additional studies 
evaluating a larger number of epileptic encephalopathy patients will be 
required to establish the role of inherited variants in the disease risk 
associated with infantile spasms and Lennox-Gastaut syndrome. 

We have identified novel de novo mutations implicating at least two 
genes for epileptic encephalopathies, and also describe a genetic archi- 
tecture that strongly suggests that we have identified additional causal 
mutations in genes intolerant to functional variation. Given that our 
sample size already shows many genes with recurrent mutations, it is 
clear that even modest increases in sample sizes will confirm many new 
genes now seen in only one of our trios. Our results also emphasize that 
it may be difficult to predict with confidence the responsible gene, even 
among known genes, based upon clinical presentation. This makes it 
clear that the future of genetic diagnostics in epileptic encephalopathies 
will need to focus on the genome as a whole as opposed to single genes 
or even gene panels. In particular, several of the genes with de novo 
mutations in our cohort have also been identified in patients with 
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Figure 2 | A protein-protein interaction network of genes with de novo 
mutations found in infantile spasms and Lennox-Gastaut 

syndrome patients studied. The geometric shapes reflect differing protein 
roles, as defined by ingenuity pathway analysis (Ingenuity Systems): enzyme, 
rhombus; ion channel, vertical rectangle; kinase, inverted triangle; ligand- 
dependent nuclear receptor, horizontal rectangle; phosphatase, triangle; 
transcription regulator, horizontal oval; transmembrane receptor, vertical oval; 
transporter, trapezoid; and unknown, circle. Six of the genes found to harbour 


intellectual disability or autism spectrum disorder. Finally, and perhaps 
most importantly, this work suggests a clear direction for both drug 
development and treatment personalization in the epileptic encepha- 
lopathies, as many of these mutations seem to converge on specific 
biological pathways. 


METHODS SUMMARY 


All probands and family members were collected as part of the Epilepsy Phenome/ 
Genome Project (EPGP) cohort”? (Supplementary Table 1). Detailed inclusion and 
exclusion criteria are provided in Methods. Patient collection and sharing of speci- 
mens for research were approved by site-specific Institutional Review Boards. 
We sequenced the exome of each trio, from DNA derived from primary cells 
(n = 224 trios) or from lymphoblastoid cell lines (LCLs) in one or more family 
members (n = 40 trios), using the TruSeq Exome Enrichment kit (Illumina). We 
aligned samples and called variants using established algorithms (Methods) and 
identified candidate de novo variants at sites included in the exons or splice sites of 
the consensus coding sequence (CCDS) as those called in the affected child and 
absent in both parents, despite each parent having at least tenfold coverage at the 
site. Variants created by the de novo mutation also had to be absent in our internal 
controls (n = 436), as well as the approximately 6,500 samples represented in the 
Exome Variant Server (http://evs.gs.washington.edu/EVS), and had to pass visual 
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de novo mutations in an infantile spasms or Lennox-Gastaut syndrome patient 
are known OMIM genes associated with epileptic encephalopathy (shaded 
circles). Five additional known OMIM genes associated with epileptic 
encephalopathy that were not found to be mutated in the 264 epileptic 
encephalopathy patients, but are involved in this network, are also shown 
(shaded circles with the gene underlined). The previously identified severe 
infantile epilepsy gene TNK2 is superimposed into this network (red circle). 


inspection of alignment quality. Candidate de novo mutations were confirmed to 
be de novo mutations using Sanger sequencing. In all cases, primary DNA from the 
proband was used for the Sanger confirmation so that mutations appearing in the 
transformation process for the 40 trios sequenced from LCLs would be eliminated. 

To determine whether our list of de novo mutations was preferentially located in 
genes contained in the six gene lists we calculated the proportion of CCDS de novo 
mutation opportunity space for each list (Additional Methods). A binomial prob- 
ability calculation was used to determine whether the de novo mutations in CCDS 
transcripts identified in this cohort of epileptic encephalopathy patients were 
selectively enriched within the coding sequence of genes within a particular gene 
list (Supplementary Table 10). 

Ingenuity Pathway Analysis (Ingenuity Systems) was used to assess the con- 
nectivity of proteins harbouring a de novo mutation. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Study subjects. Infantile spasms and Lennox-Gastaut syndrome patients evalu- 
ated in this study were collected through the Epilepsy Phenome/Genome Project 
(EPGP, http://www.epgp.org)”’. Patients were enrolled across 23 clinical sites. 
Informed consent was obtained for all patients in accordance with the site-specific 
Institutional Review Boards. Phenotypic information has been centrally databased 
and DNA specimens stored at the Coriell Institute-NINDS Genetics Repository 
(Supplementary Table 1). Infantile spasms patients were required to have hypsar- 
rhythmia or a hypsarrhythmia variant on EEG. Lennox-Gastaut syndrome patients 
were required to have EEG background slowing or disorganization for age and 
generalized spike and wave activity of any frequency or generalized paroxysmal fast 
activity (GPFA). Background slowing was defined as <8 Hz posterior dominant 
rhythm in patients over 3 years of age, and <5 Hz in patients over 2 years of age. 
EEGs with normal backgrounds were accepted if the generalized spike and wave 
activity was 2.5 Hz or less and/or if GPFA was present. 

All patients were required to have no evidence of moderate-to-severe develop- 
mental impairment or diagnosis of autistic disorder or pervasive developmental 
disorder before the onset of seizures. Severe developmental delay was defined by 
50% or more delay in any area: motor, social, language, cognition, or activities of 
living; or global delay. Mild delay was defined as delay of less than 50% of expected 
milestones in one area, or less than 30% of milestones across more than one area. All 
patients had no confirmed genetic or metabolic diagnosis, and no history of congen- 
ital TORCH infection, premature birth (before 32 weeks gestation), neonatal hypoxic- 
ischaemic encephalopathy or neonatal seizures, meningitis/encephalitis, stroke, 
intracranial haemorrhage, significant head trauma, or evidence of acquired epilepsy. 
All infantile spasms and Lennox-Gastaut syndrome patients had an MRI or CT scan 
interpreted as normal, mild diffuse atrophy or focal cortical dysplasia. (Our case with 
the mutation in HNRNPU had had a reportedly normal MRI but on review of past 
records, a second more detailed MRI was found showing small regions of periven- 
tricular nodular heterotopia.) To participate, both biological parents had to have no 
past medical history of seizures (except febrile or metabolic/toxic seizures). 

A final diagnosis form was completed by the local site EPGP principal investi- 
gator based on all collected information. A subset of cases was reviewed indepen- 
dently by two members of the EPGP Data Review Core to ensure data quality 
and consistency. All EEGs were reviewed by a site investigator and an EEG core 
member to assess data quality and EEG inclusion criteria. EEGs accepted for 
inclusion were then reviewed and scored by two EEG core members for specific 
EEG phenotypic features. Disagreements were resolved by consensus conference 
among two or more EEG core members before the EEG data set was finalized. MRI 
scans were reviewed by local investigators and an MRI core member to exclude an 
acquired symptomatic lesion. 

Exome-sequenced unrelated controls (n = 436) used to ascertain mutation 

frequencies were sequenced in the Center for Human Genome Variation as part 
of other genetic studies. 
Exome sequencing, alignment and variant calling. Exome sequencing was carried 
out within the Genomic Analysis Facility in the Center for Human Genome Variation 
(Duke University). Sequencing libraries were prepared from primary DNA extracted 
from leukocytes of parents and probands using the Illumina TruSeq library prepara- 
tion kit following the manufacturer's protocol. [lumina TruSeq Exome Enrichment 
kit was used to selectively amplify the coding regions of the genome according to the 
manufacturer’s protocol. Six individual barcoded samples (two complete trios) were 
sequenced in parallel across two lanes of an Illumina HiSeq 2000 sequencer. 

Alignment of the sequenced DNA fragments to Human Reference Genome (NCBI 
Build 37) was performed using the Burrows-Wheeler Alignment Tool (BWA) (ver- 
sion 0.5.10). The reference sequence we use is identical to the 1000 Genomes Phase II 
reference and it consists of chromosomes 1-22, X, Y, MT, unplaced and unlocalized 
contigs, the human herpesvirus 4 type 1 (NC_007605), and decoy sequences (hs37d5) 
derived from HuRef, Human Bac and Fosmid clones and NA12878. 

After alignments were produced for each individual separately using BWA, 
candidate de novo variants were jointly called with the GATK Unified Genotyper 
for all family members in a trio. Loci bearing putative de novo mutations were 
extracted from the variant call format files (VCFs) that met the following criteria: 
(1) the read depth in both parents should be greater than or equal to 10; (2) the 
depth of coverage in the child should be at least one-tenth of the sum of the coverage 
in both parents; (3) for de novo variants, less than 5% of the reads in either parent 
should carry the alternate allele; (4) at least 25% of the reads in the child should carry 
the alternate allele; (5) the normalized, phred-scaled likelihood (PL) scores for the 
offspring genotypes AA, AB and BB, where A is the reference allele and B is the 
alternate allele, should be >20, 0 and >0, respectively; (6) the PL scores for both 
parents should be 0, >20 and >20; (7) at least three variant alleles must be observed 
in the proband; and (8) the de novo variant had to be located in a CCDS exon 
targeted by the exome enrichment kit. PL scores are assigned such that the most 
likely genotype is given a score of 0, and the score for the other two genotypes 


represent the likelihood that they are not the true genotypes. SnpEff (version 3.0a) 
was used to annotate the variants according to Ensembl (version 69) and consensus 
coding sequencing (CCDS release 9, GRCh37.p5) and limited analyses to exonic 
or splice site (2 bp flanking an exon) mutations. All candidate de novo mutations 
that were absent from population controls, including a set of 436 internally 
sequenced controls and the ~6,500 individuals whose single nucleotide variant data 
are reported in the Exome Variant Server, NHLBI Exome Sequencing Project (http:// 
evs.gs.washington.edu/EVS/; August 2012) were also visually inspected using 
Integrative Genomics Viewer (IGV). All candidate de novo mutations were con- 
firmed with Sanger sequencing of the relevant proband and parents. For compar- 
ison, we also called de novo variants from probands and parents individually for a 
subset of trios. Using this individual calling approach we identified and confirmed 
an additional 46 de novo mutations. These were included in all the downstream 
de novo mutation analyses. 

Calculation of gene-specific mutation rate. Point mutation rates were scaled to 
per base pair, per generation, based on the human genome sequences matrix” 
(provided by S. Sunyaev and P. Polak), and the known human average genome 
de novo mutation rate (1.2 X 10°)". The mutation rate (M) of each gene was 
calculated by adding up point mutation rates in effectively captured CCDS regions 
in the offspring of trios, and then dividing by the total trio number (S = 264). The 
P value was calculated as [1 — Poisson cumulative distribution function (x — 1,A)], 
where x is the observed de novo mutation number for the gene, and A is calculated 
as 2SM for genes on autosome or (2f + m)M for genes on chromosome X (fand m 
are the number of sequenced female and male probands, respectively). Genes on Y 
chromosome were not part of these analyses. Two de novo mutations in gene 
ALG13 are at the same position, likewise in gene SCN2A. We calculated the 
probability of this special case as [1 — Poisson cumulative distribution function 
(1, (2f + m)r)], where r reflects the point mutation rate on that specific de novo 
mutation position. Further investigations indicated that it is unlikely for these de 
novo mutations, which occur at the same site across distinct probands, to have been 
caused by sequencing or mapping errors (Supplementary Methods). 
Calculation of mutation tolerance for HGNC genes. To assign quantitatively a 
mutation tolerance score to genes in the human genome (HGNC genes), we calcu- 
lated an empirical penalty based on the presence of common functional variation 
using the aggregate sequence data available from the 6,503 samples reported in the 
Exome Variant Server, NHLBI Exome Sequencing Project (http://evs.gs.washington. 
edu/EVS/; accessed August 2012). We first filtered within the EVS database and 
eliminated from further consideration genes where the number of tenfold average 
covered bases was less than 70% of its total extent. In calculating a score, we focused 
on departures from the average common functional variant frequency spectrum, 
corrected for the total mutation burden in a gene. We construct this score as follows. 
Let Y be the total number of common, minor allele frequency >0.1%, missense and 
nonsense (including splice) variants and let X be total number of variants (including 
synonymous) observed within a gene. We regress Y on X and take the studentized 
residual as the score (S). Thus, the raw residual is divided by an estimate of its 
standard deviation and thus account for differences in variability that comes with 
differing mutational burdens. S measures the departure from the average number of 
common functional mutations found in genes with a similar amount of mutational 
burden. Thus, when S = 0 the gene has the average number of common functional 
variants given its total mutational burden. Genes where S <0 have less common 
functionals than average for their mutational burden and thus, would seem to be less 
tolerant of functional mutation, indicating the presence of weak purifying selection. 
We further investigated how different ‘intolerance’ thresholds of S captured known 
epileptic encephalopathy genes (Supplementary Table 8). Supplementary Fig. 6 illu- 
strates how different percentiles of S lead to the classification of different proportions 
of the known epileptic encephalopathy genes as ‘intolerant’. Note that ARX is not in 
these analyses as this gene did not meet a 70% of gene coverage threshold. The dashed 
vertical line in Supplementary Fig. 6 illustrates the 25th percentile of S and shows that 
using this threshold results in 12 out of the 14 assessed known genes being considered 
‘intolerant’. On the basis of this analysis, we used this 25th percentile threshold in 
classifying genes as intolerant in all subsequent analyses. Supplementary Table 9 lists 
the 25th percentile of most intolerant genes that had Sanger confirmed de novo 
mutations among the infantile spasms/Lennox—Gastaut syndrome probands. 
Defining the CCDS opportunity space for detecting de novo mutations. For 
each trio, we defined callable exonic bases that had the opportunity for identifica- 
tion of a coding de novo mutation, by restricting to bases where each of the three 
family members had at least tenfold coverage, obtained a multi-sampling (GATK) 
raw phred-scaled confidence score of =20 in the presence or absence of a variant, 
and were within the consensus coding sequence (CCDS release 9, GRCh37.p5) or 
within the two base pairs at each end of exons to allow for splice acceptor and 
donor variants. Using these three criteria, the average CCDS-defined de novo 
mutation opportunity space across 264 trios was found to be 28.84 + 0.92 Mb 
(range of 25.46-30.25 Mb). 
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To explore at the gene level, we similarly assessed the de novo calling oppor- 
tunity within any given trio for every gene with a CCDS transcript. For genes with 
instances of non-overlapping CCDS transcripts, we merged the corresponding 
regions into a consensus summary of all CCDS-defined bases for that gene. 
Using these criteria, over 85% of the CCDS-defined exonic regions were sequenced 
to at least tenfold coverage across the three family members in over 90% of trios. 
All 264 trios covered at least 79% of the CCDS-defined regions under the CCDS 
opportunity space criteria. 
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Calculations of CCDS opportunity space for calling a de novo mutation, aside 
from the Y chromosome, were used in both the gene-list enrichment and archi- 
tecture calculations. 
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Induction of mouse germ-cell fate by transcription 


factors in vitro 


Fumio Nakaki', Katsuhiko Hayashi'?*, Hiroshi Ohta!*, Kazuki Kurimoto!*, Yukihiro Yabuta'* & Mitinori Saitou'?*° 


The germ-cell lineage ensures the continuity of life through the 
generation of male and female gametes, which unite to form a toti- 
potent zygote. We have previously demonstrated that, by using cyto- 
kines, embryonic stem cells and induced pluripotent stem cells can 
be induced into epiblast-like cells (EpiLCs) and then into primordial 
germ cell (PGC)-like cells with the capacity for both spermatogenesis 
and oogenesis'”, creating an opportunity for understanding and 
regulating mammalian germ-cell development in both sexes in vitro. 
Here we show that, without cytokines, simultaneous overexpression 
of three transcription factors, Blimp1 (also known as Prdm1), 
Prdm14 and Tfap2c (also known as AP2y), directs EpiLCs, but 
not embryonic stem cells, swiftly and efficiently into a PGC state. 
Notably, Prdm14 alone, but not Blimp1 or Tfap2c, suffices for the 
induction of the PGC state in EpiLCs. The transcription-factor-induced 
PGC state, irrespective of the transcription factors used, reconsti- 
tutes key transcriptome and epigenetic reprogramming in PGCs, 
but bypasses a mesodermal program that accompanies PGC or 
PGC-like-cell specification by cytokines including bone morphoge- 
netic protein 4. Notably, the transcription-factor-induced PGC-like 
cells contribute to spermatogenesis and fertile offspring. Our find- 
ings provide a new insight into the transcriptional logic for PGC 
specification, and create a foundation for the transcription-factor- 
based reconstitution and regulation of mammalian gametogenesis. 

In mice, PGCs, the precursors for spermatozoa and oocytes, arise in 
the epiblasts in response to cytokines, including bone morphogenetic 
protein 4 (BMP4), from extraembryonic tissues*. Gene-knockout studies 
have identified transcription factors (TFs) essential for PGC specification’. 
However, the TFs sufficient for PGC induction and the precise mech- 
anism of action of key TFs remain unknown. Using the in vitro PGC 
specification system'”, we set out to identify TFs whose forced express- 
ion may be sufficient to confer the PGC fate onto EpiLCs. We focused 
on the three TFs BLIMP1 (hereafter referred to as B), PRDM14 (P14) 
and TFAP2C (A) because they show specific expression in PGCs and 
are necessary for PGC specification*’. We derived embryonic stem 
(ES) cells expressing mVenus and enhanced cyan fluorescent protein 
(eCFP) under the control of Blimp1 and stella (also known as Dppa3 
and PGC7) regulatory elements, respectively (Blimp1-mVenus (BV) 
and stella~eCFP (SC), or BVSC)'**°, and reverse tetracycline trans- 
activator (rtT'A) under the control of the constitutively active Rosa26 
locus’ (BVSCR26rtTA ES cells) (Fig. 1a). During mouse development, 
Blimp1 expression signifies the onset of PGC specification, whereas 
stella begins expression in the established PGCs**”. We transfected the 
BVSCR26rtTA ES cells (XY karyotype) (Supplementary Fig. la-c) 
with piggyBac transposon-based vectors expressing Blimp1, Prdm14 
or Tfap2c under the control of tetracycline regulatory elements to 
isolate BVSCR26rtTA ES cells bearing transgenes for all three TFs 
(BP14A), or two (BP14, BA and P14A) or one (B, P14 and A) of the 
three TFs (Fig. la and Supplementary Fig. 1d, e). 

We first examined the effect of simultaneous forced expression of 
the three TFs. We induced BP14A cells (line 3-3) into EpiLCs, and 


generated floating aggregates of ~2,000 EpiLCs in the absence of relevant 
cytokines with or without doxycycline (Dox; 1.5 jig ml’), a tetracycline 
analogue (Supplementary Fig. 2a). The aggregates without Dox did not 
show BVSC expression over the six-day period (Fig. 1b). By contrast, 
remarkably, those with Dox exhibited robust BVSC expression as early 
as day 2 of the Dox treatment (Fig. 1b). We confirmed that Dox induces 
exogenous TFs rapidly and at high levels in nearly all EpiLCs (Sup- 
plementary Fig. 2b, c). Fluorescence activated cell sorting (FACS) revealed 
that at day 2, more than ~80% and ~30% of the cells expressed BV and 
SC, respectively (Fig. 1c), and the efficiency of BVSC induction was 
dependent on the Dox dosage (Supplementary Fig. 2d, e). The BVSC 
induction by Dox was much more efficient/faster than that by the 
cytokines'* (Supplementary Fig. 2f). The Dox-induced BVSC-positive 
cells showed proliferation and persisted until day4, but decreased 
thereafter (Fig. 1b and Supplementary Fig. 2g). The other BP14A lines 
showed similar BVSC induction by Dox (Supplementary Fig. 3a, b). As 
in the case of the induction by cytokines’, the Dox-induced EpiLCs 
adhering to the culture dish tended to detach, and the remaining cells 
did not activate BVSC as efficiently (data not shown). Thus, under the 
floating aggregate condition, the three TFs are sufficient for robust 
activation of the BVSC transgenes in EpiLCs. 

We analysed the expression of genes relevant for PGC specification 
in TF (BP14A)-induced BV-positive cells (including SC-positive cells; 
Fig. 1c) by quantitative PCR (qPCR). Like cytokine-induced BV-positive 
cells, TF-induced BV-positive cells upregulated key genes for PGC 
specification/development (Blimp1, Prdm14, Tfap2c, Nanos3, stella, 
Pou5fl, Sox2 and Nanog) and downregulated epigenetic modifiers 
(Dnmt3a and Dumt3b)' (Fig. 1d). Interestingly, unlike cytokine-induced 
cells and PGCs in vivo'*'*"*, TF-induced cells did not show transient upre- 
gulation of mesodermal genes such as Hoxal, Hoxb1 and T (also known 
as brachyury), but continued to express them at low/undetectable levels 
(Fig. 1d and Supplementary Fig. 3d), indicating that the TF (BP14A)- 
induced BV-positive cells acquire a transcriptional program similar to 
PGCs, but lack a mesodermal program. 

We next examined whether the forced expression of two or one of 
the three TFs would induce BVSC (three lines for each). We found that 
P14A, and to a lesser extent, BP14 and BA, and, notably, P14 alone, 
activated BVSC, although all at lower efficiencies than all three TFs 
combined (BP14A) (Supplementary Fig. 3a—c). The BA-induced aggregates 
(two out of three lines) looked fragile and remained small (Supplemen- 
tary Fig. 3a), and B or A alone did not activate BVSC (Supplementary 
Fig. 3a—c). We confirmed that all the lines showed uniform induction 
of exogenous TF(s) after Dox treatment (Supplementary Fig. 4a-f). It 
should be noted that we could not isolate B cells that express exogenous 
Blimp1 after Dox treatment at levels as high as those of exogenous 
Prdm14 in P14 cells (Supplementary Fig. 4c-e). The two TFs (P14A, 
BP 14, BA)- and single TF (P14)-induced BV-positive cells (including 
SC-positive cells (Supplementary Fig. 3c)) exhibited gene expression 
dynamics similar to those of BP14A-induced BV-positive cells (Fig. 1d 
and Supplementary Fig. 4g), suggesting that once the key regulators for 
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PGC specification is activated, the induced cells acquire similar tran- 
scriptional profiles. We determined the relationship between the BVSC 
induction rate and the exogenous TF expression level. As shown in 
Fig. le (data based on Supplementary Figs 3b and 4c), BP14A induced 
BVSC much more efficiently than P14A or BA or P14 at similar whole 
exogenous TF levels, indicating that BP14A shows a synergistic effect 
on the activation of PGC-like transcriptional profiles in EpiLCs. 

To exclude the possibility that the TFs activate BMP4 signalling, 
which in turn induces EpiLCs into a PGC-like state, we induced BP14A 
in EpiLCs with an inhibitor for BMP4 signalling—LDN193189, which 
inhibits activin receptor-like kinase 2/3. LDN193189 blocked BV, 
Blimp1 and Prdm14 induction by BMP4, but not by BP14A (Supplemen- 
tary Fig. 5a, b). These findings, together with the much faster kinetics of 
BVSC induction by the TFs than by the cytokines (Fig. 1b and Sup- 
plementary Fig. 2f), indicate that the TFs directly activate a PGC program. 
We then examined whether induction of a PGC-like state by TFs requires 
an EpiLC state. Although BP14A induced EpiLCs robustly into a PGC- 
like state, BP14A induction in ES cells resulted in a peculiar phenotype: 
intense SC activation with no BV (Supplementary Fig. 5c, d), demon- 
strating that a proper cellular context (for example, a proper epigenetic 
background, presence/absence of trans-acting factors) is essential for 
the robust induction of a PGC-like state by TFs. 

To characterize further the TF-induced PGC-like cells (hereafter 
referred to as TF-PGCLCs), we determined their transcriptomes (BV- 
positive cells (including SC-positive cells) induced by BP14A (days 2 
and 4), BP14 (day 2), P14A (day 2) and P14 (day 2) (Supplementary 
Fig. 6a)) and compared them with those of PGCs at embryonic day (E) 
9.5 (ref. 1) and cytokine-induced day 2, 4 and 6 PGCLCs (hereafter 
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Figure 1 | Induction of a PGC-like 
state by TFs. a, Scheme for the 
BVSCR26rtT A cells and the piggyBac 
transposon-based vectors with 
tetracycline-responsive promoters 
driving TF and B-geo expression. 

b, Induction of BVSC with (top) or 
without (bottom) Dox (1.5 pg ml ') 
in floating aggregates of EpiLCs 
induced from BP14<A cells during the 
6-day period. BF, bright field. Scale 
bar, 200 um. c, FACS analysis of 
BVSC induction by Dox 

(1.5 pg ml’) in floating aggregates 
of EpiLCs from BP14A cells (line 3-3) 
(top) or parental BVSCR26rtTA cells 
(bottom). d, Expression of the 
indicated genes measured by qPCR 
in TF (BP14A)- and cytokine- 
induced BV-positive cells (red and 
orange, respectively) or in the whole 
aggregates of parental 
BVSCR26rtTA EpiLCs with Dox 


mBP14A (grey) during the 4-day period. For 
= BP14 a each gene, the AC, from the average 
= BA a C, values of Arbp and Ppia is shown 
mPa a (log, scale, the mean value of two 
age a ie independent experiments with two 
melt technical replicates). Cks, cytokines; 
OA ND, not detectable. e, BVSC 
induction rates (%) at day 2 plotted 


against whole exogenous TF 
transcript levels (AC,) 12h after 
induction in several independent 
clones of the indicated cells. 


referred to as Ck-PGCLCs)’. Principal component analysis (PCA) 
revealed that all the TF-PGCLCs, irrespective of the TF combinations 
or of the induction period, bear similar transcriptomes, which are also 
similar to the transcriptomes of day4 and 6 Ck-PGCLCs and, to a 
lesser extent, of E9.5 PGCs (Fig. 2a and Supplementary Table 1), indi- 
cating that the exogenous TFs, at varying efficiencies depending on 
their combinations, creates a similar PGC-like transcriptome. Notably, 
day 2 Ck-PGCLCs showed a more similar transcriptome to E5.75 epi- 
blasts than to TF-PGCLCs, day 4 and 6 Ck-PGCLCs, and E9.5 PGCs, 
indicating that day 2 Ck-PGCLCs represent a transient state towards 
the acquisition of a PGC-like state from the EpiLC/epiblast states 
(Fig. 2a). The transcriptomes of the SC-positive cells induced by 
BP14A in ES cells (SC ES cells by BP14A) (Supplementary Fig. 5c, d) 
were different from those of PGCLCs/PGCs, and closer to those of ES 
cells (Fig. 2a). 

We looked at individual genes upregulated in day2 TF (BP14A)- 
PGCLCs in comparison to those in EpiLCs/control EpiLCs without 
exogenous TFs but treated with Dox, and found that genes such as 
Blimp1, Prdm14, Tfap2c, stella, Sox2, KIf2, Tcl1, Esrrb, EIf3, Kit, Lifr, 
Nr5a2, Gjb3, Tdh, Spnb3, Pygl, Mbp, Npnt and AU015386 showed 
robust upregulation. All of these genes (“core PGC genes’) were upre- 
gulated in day 4 and 6 Ck-PGCLCs and in E9.5 PGCs (Supplementary 
Fig. 6b, Supplementary Table 2). We examined the genes upregulated 
in day 2 Ck-PGCLCs but not in day 2 TF-PGCLCs in comparison to 
those in EpiLCs, which revealed that the genes Hoxal, Hoxb1, Hoxb2, 
Evx1, T, Cdx1, Cdx2, Hand1, Snail, Mesp1, Id1, Msx1, Msx2, Nkx1-2, 
Isl1, Mixl1, Rspo3, Wnt5a, Fgf8 and Bmp4—all of which show transient 
upregulation in PGC precursors at around E7.0 and represent a somatic 
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Figure 2 | Global transcription profiles for TF- and Ck-PGCLCs and 
epigenetic properties of TF-PGCLCs. a, PCA of the transcriptomes of the 
indicated cells. PGCLCs were sorted by BV. D, day. b, Immunofluorescence 
analysis for H3K9me2 (top) or H3K27me3 (bottom) in day 4 TF (BP14A)- 
PGCLCs (green fluorescent protein (GFP)-positive, dotted lines, line 3-3) 
compared to those in EpiLCs (DNMT3B-positive, line 3-3). Left, 


mesodermal program'*—were transiently upregulated in day 2 Ck-PGCLCs 
but not in day 2 TF-PGCLCs, and most of these genes were downregulated 
in day 4 and 6 Ck-PGCLCs and in E9.5 PGCs (Supplementary Fig. 6b 
and Supplementary Table 2). Genes up-/downregulated in SC ES cells 
by BP14A showed no correlation to those in TF-PGCLCs (Supplemen- 
tary Fig. 6c and Supplementary Table 3). These findings demonstrate 
on a genome-wide scale that PGC/PGCLC specification by BMP4 acti- 
vates both a key PGC program and a somatic mesodermal program, 
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4',6-diamidino-2-phenylindole (DAPI) staining. Scale bar, 20 tum. c, Bisulphite 
sequence analysis of methylated cytosine in the differentially methylated 
regions of the imprinted genes in EpiLCs and day 4 BV-positive TF (BP14A, 
line 3-3) PGCLCs. White and black circles represent unmethylated and 
methylated cytosines, respectively. 


the latter of which is eventually repressed by the former, and that the 
direct activation of key TFs confers EpiLCs with the key PGC program 
but not the somatic mesodermal program. 

We evaluated the epigenetic profiles of TF-PGCLCs. Compared to 
EpiLCs showing high DNMT3B, BV-positive day 4 TF-PGCLCs were 
extremely weak for DNMT3B and showed reduced histone H3 lysine 9 
di-methylation (H3K9me2) and increased H3K27 tri-methylation 
(H3K27me3) (Fig. 2b and Supplementary Fig. 7). BV-positive day 4 
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TF-PGCLCs retained the imprints on paternally imprinted H19 and 
maternally imprinted Snrpn (Fig. 3c). These findings suggest that the 
BV-positive day 4 TF-PGCLCs acquire an epigenome similar to day 6 
Ck-PGCLCs and to migrating PGCs at E8.5-E9.5 (refs 1, 14). 

We went on to gain insights into the targets of the three TFs. The 
induction kinetics by the TFs of endogenous Blimp1, Prdm14 and Tfap2c 
in whole EpiLC aggregates showed that (1) BLIMP1 and PRDM14 activate 
Blimp1 gradually and Tfap2c rapidly, whereas TFAP2C does not activate 
Blimp1 nor Tfap2c, but enhances the activity of BLIMP 1 to induce Tfap2c 
and Blimp1; (2) activation of Prdm14 by the TFs occurs relatively late 
(after 24h); and (3) BP14A has the strongest effect on the activation of 
Blimp1, Tfap2c and subsequently Prdm14 (Supplementary Fig. 8). We 
determined the transcriptomes induced by the TFs at 24h in whole 
EpiLC aggregates. Unsupervised hierarchical clustering and PCA showed 
that PRDM 14 alone induces a transcriptome similar to that induced by 
BP14A (Fig. 3a, Supplementary Fig. 9a and Supplementary Table 4). 
Accordingly, Gene Set Enrichment Analysis (GSEA)'* showed that core 
PGC genes are enriched in genes induced by PRDM14 and, to a lesser 
extent, in genes induced by BLIMP1, but show no significant enrich- 
ment in genes induced by TFAP2C (Fig. 3b). Thus, PRDM14 has a 
central role in the BP14A-induced PGCLC formation. 

We analysed individual genes regulated by PRDM14 and BLIMP1 
with regard to the regulation of corresponding genes by BP14A. We 
classified the genes regulated by PRDM14 and BLIMP1 into those 
regulated in the same (a) (up or down by both P14 and B) and opposite 
(b) (up by P14 and down by B, or vice versa) direction. This revealed a 
trend in which the expression-level changes of the type (a) genes were 
enhanced (Fig. 3c and Supplementary Fig. 9b-d), whereas the type (b) 
genes were ‘balanced’ (Supplementary Fig. 9e, f) by BP14A. Importantly, 
the core PGC genes were overrepresented by genes upregulated by both 
PRDM14 and BLIMP!1 (Fig. 3c). We scrutinized individual genes up-/ 
downregulated by PRDM14 and BLIMP1. We found that PRDM14 
upregulates genes associated with pluripotency (such as Epas1, Tcl1, 
Esrrb, KIf5, Nr5a2, Zfp42, KIf4, Lifr, Dppa2, Dppa5a and Nanog), and 
downregulates genes associated with neural differentiation (Zfp521, 
Sox3, Nrcam and Hs6st2). Although we did not find a clear functional 
category of genes upregulated by BLIMP1, we noted that BLIMP1 
represses targets of OCT4 and SOX2 (Fef4, Lefty1 and Lefty2), among 
others (Supplementary Fig. 9c-f and Supplementary Table 5). Some of 
the targets of PRDM14 were in common between ES cells and EpiLCs 
(upregulated: Tcl1, Esrrb and Kif5; downregulated: Hs6st2 and Damt3b)'° 
(Supplementary Fig. 9c-f). These findings reveal that PRDM14 and 
BLIMP!1 function primarily and co-operatively, whereas TFAP2C has 
an auxiliary role, at least for BLIMP1, for the acquisition of the TF- 
PGCLC transcriptome. The finding that BLIMP1 makes a relatively small 
contribution to the TF-PGCLC transcriptome would be attributable to 
TF-PGCLCs lacking the cytokine-induced somatic mesodermal pro- 
gram, which BLIMP1 has an essential role in repressing”®. 

After Dox withdrawal, the TF-PGCLCs shut off exogenous TFs, but 
continue their endogenous transcription program, and may therefore 
serve as precursors for proper spermatogenesis. To explore this possi- 
bility, we induced TF-PGCLCs (BP14A, BP 14, P14A, P14), purified the 
BV-positive cells (including SC-positive cells, at days 3, 4 and 6) (Fig. 4a 
and Supplementary Table 6), and transplanted them into seminiferous 
tubules of neonatal W/W” mice lacking endogenous germ cells’. 
Notably, ten weeks after transplantation, the testes transplanted with 
the TF-PGCLCs, particularly those sorted at days 3 and 4, irrespective 
of the TFs used, contained numerous tubules with signs of spermato- 
genesis (Fig. 4b, Supplementary Fig. 10a and Supplementary Table 6). 
These tubules showed normal spermatogenesis and contained spermato- 
zoa with proper morphology (Fig. 4c, d and Supplementary Fig. 10b). 
The control day 6 Ck-PGCLCs also contributed to spermatogenesis 
(Supplementary Table 6). By contrast, SC ES cells induced by BP14A 
did not contribute to spermatogenesis, but formed foci of teratomas in 
six out of eight transplanted testes (Supplementary Fig. 10d, e and Sup- 
plementary Table 6). The inability of day 6 TF-PGCLCs to contribute 
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Figure 4 | Spermatogenesis and fertile offspring from TF (BP14A)- 
PGCLCs. a, FACS of BV-positive TF (BP14A, line 3-10)-PGCLCs for injection. 
b, Seminiferous tubules of a W/W” mouse injected with TF-PGCLCs showing 
spermatogenesis. An arrow indicates an empty tubule. Scale bar, 500 um. 

c, Haematoxylin and eosin staining of a section of a W/W” mouse testis injected 
with TF-PGCLCs showing spermatogenesis. The asterisk indicates an empty 
tubule. c-e, Scale bar, 50 um. d, Spermatozoa (arrows) from TF-PGCLCs. 

e, Zygotes at pronuclear stages generated by injection of TF-PGCLC-derived 
spermatozoa into wild-type oocytes by intracytoplasmic sperm injection. 

f, Two-cell embryos from zygotes in e. g, h, Apparently normal offspring 

(g, h) and placenta (g) derived from TF-PGCLC-derived spermatozoa. i, A 
fertile female derived from a TF-PGCLC-derived spermatozoon. j, A proposed 
model for TF-PGCLC induction. 


to spermatogenesis may result from a prolonged non-optimal culture 
of TF-PGCLCs (Supplementary Table 6). We fertilized wild-type oocytes 
with TF-PGCLC-derived spermatozoa by intracytoplasmic sperm 
injection’. The resultant zygotes developed into two-cell embryos, and 
we transferred these embryos into pseudopregnant female mice, and 
18 days later, obtained healthy offspring with grossly normal placenta 
and imprinting states (Fig. 4e-h, Supplementary Fig. 10c, f and Sup- 
plementary Table 7). These offspring bore transgenes for the exogen- 
ous TFs and the BVSCR26rtTA (Supplementary Fig. 10g), and grew 
up normally into fertile adults (Fig. 4i, Supplementary Fig. 10h, i and 
Supplementary Table 7). We conclude that the TF-PGCLCs function 
as bona fide precursors for the spermatogenesis and healthy offspring. 

We have demonstrated the transcriptional logic by which BLIMP1, 
PRDM14 and TFAP2C activate a key PGC program and create the TF- 
PGCLCs (Fig. 4j). On the basis of our findings, it should be feasible to 
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explore TF-based regulation of further crucial processes of germ-cell 
development. The TF-based control of germ-cell development may be 
applicable to mammals other than mice, including humans. 


METHODS SUMMARY 


BVSCR26rtTA ES cells were established and maintained under the N2B27 ‘2i + 
LIF condition’’. PB-TET vectors” with Avi-Blimp1, 3X FLAG-Prdm14 and V5-Tfap2c 
(Supplementary Table 8) were transfected into the BVSCR26rtTA ES cells (XY karyo- 
type) on feeders with Lipofectamine2000 (Invitrogen) together with pPBCAG-hph 
and pCAGGS-mPB plasmids. Transgene integration was determined by PCR and 
Southern blotting. Transfected ES cells were adapted to a feeder-free condition and 
differentiated into EpiLCs as reported’. After 36h of differentiation, cells were 
aggregated (2,000 cells per aggregate) in GK15 medium! with Dox (1.5 1g ml) 
(Clonetech). Ck-PGCLCs were induced by BMP4, BMP8A, SCF, LIF and EGF as 
described’. Transcriptomes were analysed by GeneChip Mouse Genome 430 2.0 
Array (Affymetrix)'’'. Published data (Gene Expression Omnibus (GEO) acces- 
sions GSM744093-GSM744096 and GSM744099-GSM744104)! were included in 
the analysis. Seminiferous tubule injection and intracytoplasmic sperm injection 
were performed as described’'”'*. All primer sequences for PCR are listed in 
Supplementary Table 9. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Animals. All animal experiments were conducted according to the Guidelines for 
Animal Experiments of Kyoto University. The BVSC transgenic mice (C57BL/6 
background, acc. no. BV, CDB0460T; SC CDB0465T: http://www.cdb.riken.jp/ 
arg/TG%20mutant%20mice%20list. html) were established as reported previously’°. 
B6;129-Gt(ROSA)26Sor!”1 TAM 2Vae Co] g imr(tetO-PousfVJaesy mice! (stock num- 
ber: 006911) were purchased from the Jackson Laboratory. WBB6F1- W/W” mice 
were purchased from SLC. 
Establishment of ES cells. Mice homozygous for the Rosa26-rtTA knock-in allele were 
obtained by crossing of B6;129-Gt(ROSA 26S or MTA eMac a] g ptm2ltetO-Pousfiyjac jy 
mice heterozygous for both loci. They were mated with BVSC transgenic mice and 
blastocysts were recovered at E2.5. BVSC-R26rtTA ES cells were selected by PCR 
genotyping, and established and maintained under the N2B27 ‘2i + LIF condition”. 
A male cell line was used in this study. 
Chimaera formation assay. BVSC-R26rtTA ES cells were trypsinized and a single- 
cell suspension was prepared. Approximately 15 ES cells per embryo were injected 
into blastocoels of E3.5 blastocysts obtained from ICR (albino) female mice with a 
piezo-actuating micromanipulator. Injected embryos were transferred into the 
uteri of E2.5 pseudopregnant ICR female mice. Chimaeric mice were delivered 
by caesarean section at E18.5. Chimaerism was determined by coat-colour. The 
chimaeric mice were subjected to test breeding with ICR female mice to confirm 
the germ-line contribution. 
Vector construction. The mouse Blimp1 (from ATG in exon 3) and Tfap2c 
variant 1 (accession number: NM_009335.2) coding sequences were cloned by 
PCR flanked with Sall-AviTag-XhoI and NotI sites and NotI and EcoRI sites, 
respectively. The Prdm14 coding sequence was obtained from AG-P14 (ref. 16). 
The Sall-Avi-Blimp1-NotI cassette was subcloned into XhoI/Notl sites of the 
pPyCAG-cHA-IP plasmid”, and this cassette was subcloned again into the EcoRI/ 
NotI sites of the pENTRIA dual selection vector (Invitrogen). For Prdm14 and 
Tfap2c, Kpnl-3X FLAG-XholI-S(G4S)3_Linker-Spel and BamHI-V5-S(G4S)3_Linker- 
Not! fragments, respectively, were attached to the amino termini by PCR or synthesized 
oligonucleotide linker ligation. 3x FLAG-Prdm14 and V5-Tfap2c cassettes were 
subcloned into the KpnI/NotI and BamHI/EcoRI sites of pENTRIA, respectively. 
Lastly, they were shuttled into the PB-TET destination vector (Addgene)”° with LR 
clonase II enzyme mix (Invitrogen). To construct pPBCAG-hph, a CAG promoter 
fragment from the pCAGGS plasmid obtained by digestion of Spel and EcoRI 
(filled) was inserted into the GG131 vector’’** digested with Spel/Mscl. All sequences 
engineered by PCR or oligonucleotide synthesis were confirmed. All attached 
sequences are shown in Supplementary Table 8 and primer sequences for cloning 
are listed in Supplementary Table 9. 
Transfection and selection of subclones. BVSC-R26rtTA ES cells were transfected 
with PB-TET vectors containing key factors, pPPBCAG-hph and pCAGGS-mPB 
using Lipofectamine2000 (Invitrogen) on feeder cells (mouse embryonic fibroblasts) 
in a 60-mm dish under a 2i + LIF condition. The total amount of vector DNA was 
below 8 jig. Transfectants were selected with hygromycin B (150 jg ml’) (Sigma) 
and genotyped with PCR for transgenes. The primer sequences for the genotype 
are shown in Supplementary Table 9. 
Southern blotting. Eight micrograms of genomic DNA was isolated and digested 
with BamHI. DNA fragments were electrophoresed in 0.7% agarose gel, transferred 
to Hybond N* (GE Healthcare) and ultraviolet-crosslinked. The B-geo probe was 
obtained by digestion of PB-TET with RsrII/Smal, labelled with 32p (PerkinElmer) 
by a random primer DNA labelling kit ver. 2.0 (TaKaRa) and purified with an 
Illustra ProbeQuant spin column (GE Healthcare). Radioisotope images were cap- 
tured with a BAS system (Fujifilm). 
Karyotyping. Metaphase chromosome spreads of ES cells were prepared by treating 
them with demecolcin solution (0.03 pg ml’) for 2h at 37 °C, followed by hypo- 
tonic treatment with 75 mM potassium chloride for 15 min at room temperature. 
Cells were fixed in Carnoy’s solution (3:1 mixture of methanol and acetic acid) and 
dropped onto glass slides soaked in 50% ethanol before the analysis. The presence 
of the Y chromosome was determined by PCR with Ubel primers. Cell lines 
including 2-4, 7-1, 7-5 and 7-8 were found to have the XO karyotype. The primer 
sequences are described in Supplementary Table 9. 
TF- and Ck-PGCLCs. Transfected ES cells were adapted to a feeder-free condition 
before induction. EpiLC differentiation was performed as reported previously’. After 
36h of differentiation, cells were collected and cultured in a Lipidure-Coat 96-well 
plate (NOF) to be aggregated (started with 2,000 cells per well) in GK15 (ref. 1) with 
1.5 pg ml ! of Dox (Clonetech). The TF-PGCLCs can be induced as 1,000-5,000 
(possibly more) cells per well. We used 2,000 cells per well in this study, as this 
condition was most efficient for obtaining BVSC-positive TF-PGCLCs. PGCLCs 
were induced by BMP4 (500 ng ml |), BMP8A (500 ng ml '), SCE (100 ng ml’), 
LIF (1,000 U ml‘) and EGF (50 ngml- 1) as previously described’. LDN193189 
(120pM; Stemgent) was added concurrently with Dox or cytokines. Aggregates 
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from ground-state ES cells were also cultured in GK15 with Dox as described 
above. 

Reverse transcription and qPCR. For evaluating endogenous transcripts, TF- 
induced BV-positive cells were FACS-sorted on days 2 and 4 with the gates shown 
in Fig. 1c and Supplementary Fig. 3c. The sorting gates used for day 2 and 4 Ck- 
PGCLCs are shown in Supplementary Fig. 6a. Aggregates were trypsinized and 
lysated as a whole unless otherwise specified. Total RNA was purified with an 
RNeasy Micro kit (QIAGEN) and reverse transcription was performed with Super- 
Script III (Invitrogen) primed with oligo-dT primer according to the manufacturer's 
protocol. Real-time PCR was performed with Power SYBR (Applied Biosystems) 
and CFX384 (BioRad). The gene expression levels are presented as AC, (in log, 
scale) normalized with the average C, values of Arbp and Ppia’?. 

To discriminate endogenous transcripts from exogenous ones, both the oligo- 
dT primer (Invitrogen) and gene-specific primers of interest were used for reverse 
transcription to reduce the reverse transcription bias due to differences in the 
distance between reverse transcription priming sites and amplified regions. The 
amplification efficiency of the newly designed primer sets was determined with 
pGEM-T-Easy plasmids containing the corresponding amplicons as templates. To 
verify both the endogenous and exogenous expression levels, samples were tested 
with CDS primers concurrently (data not shown). We found that the amplification 
efficiency of the 3x FLAG-Prdm14 primer set was slightly lower than that of the 
other primer sets. Therefore, we adjusted the C, value of 3x FLAG-Prdm14 using 
the equation: 


CCt(3 x FLAG-Prdm14) = 1.017 & Cig x FLAG-Prdm14) — 1.039 


in which cCy3xFLA4G-Prdmi4) ANd Cy3p.AG-Prdmi4) are the adjusted C, values and 
original C, values obtained by the experiment using the 3x FLAG-Prdm14 primer 
set, respectively. This equation was determined by linear regression between the 
Cy3xrLAG-Pramia) and C, values of the B-geo primer set in cell lines containing 
3X FLAG-Prdm14 but not the other two TFs (data not shown). In this paper, 
CCy3xRLAG-Prdmia) Was used for calculation of the AC, values for the 3x FLAG- 
Prdm14 primer set. 

The primer sequences are listed in Supplementary Table 9 (refs 12, 21). 
LacZ staining. Cell aggregates at 12 h were trypsinized and fixed with 2% para- 
formaldehyde and 0.2% glutaraldehyde. Fixed cells were spread with cytospin4 
(Thermo Scientific) and stained with LacZ staining solution overnight”. 
Flowcytometric analysis and cell sorting. The sample preparations from cell 
aggregates were performed essentially as described previously’. FACS was per- 
formed with a FACSAria or FACSAriallI (BD) cell sorter. BV and SC fluorescence 
were detected with the FITC and AmCyan Horizon V500 channel, respectively. 
Data were analysed with FACSDiva (BD) or Flowjo (Tree Star) software. 
Immunofluorescent staining. BV-positive cells from BP14A-induced day 4 aggre- 
gates were sorted with the gate shown in Supplementary Fig. 7, mixed with EpiLCs 
at a ratio of 1:1 and spread onto MAS-coated glass slides (Matsunami). Immuno- 
fluorescent staining was performed as reported previously’. The primary antibodies 
used in this study were as follows: anti-GFP (rat monoclonal antibody; Nacalai 
Tesque GF_090R), anti-DNMT3B (mouse monoclonal antibody; Imgenex IMG- 
184A), anti-H3K27me3 (rabbit, polyclonal antibody; Millipore 07-449), and anti- 
H3K9me2 (rabbit, polyclonal antibody, Millipore 07-441). Secondary antibodies 
were as follows: Alexa Fluor 568 anti-rabbit IgG, Alexa Fluor 488 anti-rat IgG and 
Alexa Fluor 647 anti-mouse IgG (all three from Invitrogen, A11011, A11006 and 
A21235, respectively). Images were captured with a confocal laser scanning micro- 
scope (Olympus FV1000). 
Bisulphite sequencing. Genomic DNA was isolated and bisulphite treatment was 
conducted with an EpiTect bisulphite kit (QIAGEN) according to the manufac- 
turer’s protocol. The differentially methylated regions of Surpn, H19, Igf2r, Pegl 
and Peg3 were amplified by PCR as previously reported”. Sequences were deter- 
mined and analysed with QUMA (http://quma.cdb.riken.jp/top/index.html)”*. 
cDNA amplification and microarray analysis. For characterization of TF- 
PGCLCs, the cell populations surrounded with red rectangles in Supplementary 
Fig. 6a were sorted by FACS. Note that the background level was different in 
sorting of Ck-PGCLCs and TF-PGCLCs on day 2. Total RNA was isolated with 
an RNeasy micro kit (QIAGEN) and reverse transcription and cDNA amplifica- 
tion were conducted as previously described’”’. Samples were analysed with a 
GeneChip Mouse Genome 430 2.0 Array (Affymetrix). Data were normalized with 
dChip software and are shown in log, scale’’. Probes that exhibited a ‘present’ call 
in at least one sample were included in the analysis. Probe selection criteria for 
further analysis were as follows: (1) maximum expression score = 8; (2) maximum 
differential expression level = 2; and (3) exhibiting the highest expression level 
among multiple probes for a gene, if any. Published data (GEO accession GSE30056 
(two replicates of ES cells (GSM744093 and GSM744094), EpiLCs (GSM744095 
and GSM744096), epiblast (GSM744099 and GSM744100), Ck-PGCLCs d6 
(GSM744101 and GSM744102), and E9.5 PGCs (GSM744103 and GSM 744104))) 
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were also included’. We selected 4,464 probes and performed PCA with R (version 
2.15.1)**. For the differential gene expression analysis, we averaged biological dupli- 
cates or quadruplicates and selected core PGC genes (Supplementary Fig. 6b, top) 
and ‘somatic mesodermal genes’ (Supplementary Fig. 6b, bottom) according to the 
following criteria. Core PGC genes were (1) upregulated in BP14A-induced day 2 
more than fourfold compared with both EpiLCs and the parental clone with Dox 
day 2; and (2) not downregulated in E9.5 PGCs (the expression level was not less 
than a half of BP14A-induced day 2). Somatic mesodermal genes exhibited at 
least fourfold upregulation in day 2 PGCLCs as compared with both EpiLCs and 
BP14A-induced day 2. Representative genes are specified. The complete list is 
shown in Supplementary Table 2. To compare the genes whose expression was 
altered by BP14A between EpiLCs and ES cells, differences in logs scale between 
the averaged expression value of day 2 TF-PGCLCs and EpiLCs, between that of 
SC-positive cells induced by BP14A in ES cells and ES cells, and between that of Ck- 
PGCLCs day 4 and EpiLCs were calculated. The 100 genes exhibiting the most 
altered expression in EpiLCs and ES cells are listed in Supplementary Table 3. 
To identify the target genes of each transcription factor, total RNA was isolated 
with an RNeasy Micro kit (QIAGEN) from both EpiLCs and whole cells from 
day 1 aggregates on each of the BP14A (line 3-3), B (line 2-4), P14 (line 7-109), and 
A (line 8-2) cell lines and the parental clone cultured with Dox. Data acquisition 
with microarray and data normalization was performed as described above. We 
first selected 38,769 probes that exhibited a present call in at least one sample and 
performed GSEA"* with core PGC genes defined above as a gene set. Genes were 
ranked by the difference of the expression levels (in log, scale) between day 1 
aggregates and EpiLCs for each cell line. For further analysis, 4,211 probes were 
selected according to the following criteria: (1) maximum expression score = 6; (2) 
maximum differential expression level = 1; (3) P value of one-way analysis of 
variance for 10 groups (each group containing two biological replicates) = 0.10 
(false discovery rate = 0.10, calculated by the ‘qvalue’ software package” in the pro- 
gram R); and (4) exhibiting the highest expression level among multiple probes for 
a gene, if any. Unsupervised hierarchical clustering and PCA were performed with 
R (version 2.15.1)*. To identify a gene whose expression was altered by each of the 
transcription factors, we averaged biological replicates and calculated the differ- 
ential expression level between day 1 aggregates and EpiLCs in each cell line. The 
differential expression level of the parental clone was further subtracted (in log, scale) 
from that of the transfected lines to exclude the effects of aggregation formation. 
We defined a gene exhibiting this value (fold change from the parental clone) = 1 
or = 1 as up- or downregulated by each factor, respectively. The accession number 
for the microarray data presented in this study is GSE46855 (GEO database). 


Seminiferous tubule injection. After the designated cell populations were sorted 
by FACS, 1 X 10° cells per testis were injected into the neonatal testes of W/W" 
mice (7 days post partum) basically as previously described'’. Anti-mouse CD4 
antibody (50 mg per dose, clone GK1.5; eBioscience or Biolegend) was injected 
intraperitoneally at day 0, 2 or 4 for immunosuppression as necessary’. The 
transplanted testes were analysed 10 weeks after injection. For haematoxylin 
and eosin staining, testis samples were fixed with Bouin’s solution, embedded in 
paraffin, and sectioned. 

Intracytoplasmic sperm injection. Intracytoplasmic sperm injection was per- 
formed basically as reported previously"*. In brief, seminiferous tubules with sper- 
matogenesis colonies were gently minced and a spermatogenic cell suspension was 
prepared. Spermatozoa were injected into oocytes recovered from BDF1 mice. 
After in vitro embryo culture, two-cell-stage embryos were transferred into the 
oviducts of pseudopregnant mice at 0.5 days post coitum (d.p.c.) (ICR). Pups were 
delivered by caesarean section at 18.5 d.p.c. The primer sequences used for geno- 
typing PCR are described in Supplementary Table 9. 
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The pluripotent genome in three dimensions is 
shaped around pluripotency factors 
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It is becoming increasingly clear that the shape of the genome impor- 
tantly influences transcription regulation. Pluripotent stem cells 
such as embryonic stem cells were recently shown to organize their 
chromosomes into topological domains that are largely invariant 
between cell types’. Here we combine chromatin conformation cap- 
ture technologies with chromatin factor binding data to demonstrate 
that inactive chromatin is unusually disorganized in pluripotent 
stem-cell nuclei. We show that gene promoters engage in contacts 
between topological domains in a largely tissue-independent manner, 
whereas enhancers have a more tissue-restricted interaction profile. 
Notably, genomic clusters of pluripotency factor binding sites find 
each other very efficiently, in a manner that is strictly pluripotent- 
stem-cell-specific, dependent on the presence of Oct4 and Nanog 
protein and inducible after artificial recruitment of Nanog to a 
selected chromosomal site. We conclude that pluripotent stem cells 
have a unique higher-order genome structure shaped by pluripo- 
tency factors. We speculate that this interactome enhances the robust- 
ness of the pluripotent state. 

In recent years, several technological advances have made it possible 
to delineate the three-dimensional shape of the genome’. Spatial 
organization of DNA has been recognized as an additional regulatory 
layer of chromatin, important for gene regulation and transcriptional 
competence**. In somatic cells active and inactive chromosomal 
regions are spatially segregated®’. Recently, the genome was further 
shown to be subdivided into evolutionarily conserved topological 
domains’”. 

4C (chromosome conformation capture combined with sequen- 
cing) is a genome-scale variant of the 3C technology’, which examines 
the spatial organization of DNA and measures the contact frequencies 
ofa chosen genomic site, or ‘viewpoint’, with the rest of the genome. To 
assess chromosome topology in mouse E14 embryonic stem cells 
(IB10), we generated high-resolution contact maps using 4C sequen- 
cing’ for a series of individual sites representative of different chromo- 
somal regions on various chromosomes (Supplementary Fig. 1 and 
Methods). All 4C experiments show the typical result of a chro- 
mosome conformation capture experiment, with the bulk of the signal 
close to the viewpoint, intrachromosomal captures outnumbering 
interchromosomal captures, and clustering of captures at distal sites®” 
(Supplementary Fig. 2 and Supplementary Table 1). To identify sig- 
nificant intra- and interchromosomal contacts, we used a windowing 
approach in combination with a false discovery rate (FDR) analysis 
that determines significant clustering of independently captured 
sequences (FDR, ~ = 0.01; ref. 10). Contacts in this case can mean 
either direct interactions between the chromatin of chromosomal 
regions or indirect contacts via intermediate protein complexes. 3D- 
DNA fluorescence in situ hybridization (FISH) experiments validated 
the 4C results (Supplementary Fig. 3). 


Different from what is observed in somatic cells, in embryonic stem 
cells we find that transcriptionally inactive regions form low numbers 
of specific long-range contacts (Fig. la, b). This is not due to their 
inability to reach over large distances, but instead to a more random 
organization of their long-range captures (Supplementary Fig. 4), 
suggesting that inactive chromatin is spatially less organized in 
pluripotent nuclei. We confirmed these results in an independent, 
129/Cast, embryonic stem (ES) cell line’! (Supplementary Fig. 5a). 
We furthermore show that this is not an intrinsic feature of the selected 
regions as they do engage in many long-range contacts in astrocytes 
(Fig. 1c and Supplementary Fig. 6). For example, the chemoreceptor 
Tas2r110 gene, part of a cluster of taste receptors that is specifically 
expressed in taste buds, engages in only three contacts in ES cells but 
shows 34 specific contacts in astrocytes (Fig. 1d). 

We assessed whether the lack of long-range contacts is a global 
feature of ES cells, by analysing a recently published mouse ES cell 
Hi-C data set’. “Virtual 4C’ contact profiles extracted from the Hi-C 
data set (see Methods for details) correlate strongly to our ‘true 4C’ 
profiles (Supplementary Fig. 7), emphasizing the high level of agree- 
ment between the data sets. The Hi-C data confirm on a global scale 
that inactive and active chromatin differ in their propensity to form 
specific long-range contacts in ES cells. Similar to our 4C data, this 
difference is abolished in differentiated cells (cortex), where both active 
and inactive chromatin engage equally well in specific long-range 
contacts (Fig. le, f and Supplementary Fig. 8). 

We next asked whether chromosomal organization is reversed 
during cellular reprogramming. 129/Cast neural precursor cells 
(NPCs)? were transduced with a lentivirus containing a multicistronic 
transcript encoding Oct4 (also known as Pou5f1), KIf4, Sox2 and 
c-Myc, to generate induced pluripotent stem (iPS) cells. Quantitative 
PCR (qPCR) expression analysis of several marker genes confirmed 
reprogramming (Supplementary Fig. 9). A reactivated gene (Nanog) 
gains contacts during iPS cell reprogramming, whereas a resilenced 
gene (Ptprz1) loses all but two contacts (Supplementary Fig. 10), dem- 
onstrating that cellular reprogramming is accompanied by the reemer- 
gence of a pluripotency-specific spatial organization of the genome. 

A closer inspection of the intrachromosomal contacts made by the 
Nanog gene revealed another aspect of the 3D pluripotent genome; 
Nanog was found to interact with many genes that are known to have 
an important role in maintenance of ES cell pluripotency, including 
Rybp, Ezh2, Tcf3 and Smarcad1. The ES-cell-specific nature of these 
contacts becomes obvious from a comparison between the ES cell and 
NPC contact profiles of the Nanog gene. Most of the ES-cell-specific 
interacting regions have a high density of binding sites for the plur- 
ipotency factors Oct4, Sox2 and Nanog (Fig. 2a, b). Importantly, we 
also find such preferential associations when we apply 4C to the Sox2 
enhancer (Supplementary Fig. 11). The ES cell Hi-C data’ show that 
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Figure 1 | Inactive regions lack specific long-range interactions in 
embryonic stem cells. a, Domainogram analysis (see Methods) shows 4C 
profiles of Retsat, Nanog, Ptprz1 and gene desert (top to bottom) in ES cells. 
Plots represent contact profiles of active (dark green; n = 7) and inactive (dark 
red; n = 8) viewpoints. Below the domainograms, a map of the chromosomal 
position of the genes is plotted. b, Quantification of the number of significantly 
contacted regions for different viewpoints in ES cells. Green bars denote 
viewpoints in active regions, red bars denote viewpoints in inactive regions. 
c, Chromosomal maps show read count distribution for a gene desert (at 
44.5 Mb) and for Tas2r110, for ES cells (red) and astrocytes (blue). The 4C 
signal is calculated using a sliding window average (running mean) of the read 
counts (window size is 51). The vertical axis is maximized at the ninety-fifth 
percentile. d, Quantification of the number of far-cis regions that are 
significantly contacted by a given viewpoint in ES cells (red) or astrocytes 
(blue). GD, gene desert. e, A pairwise contact matrix was generated to calculate 
disorganization scores from the Hi-C data (see Methods). Chromosome 6 was 
segmented into regions with high density of H3K4mel and low density of 
H3K4mel, as a proxy for active and inactive chromatin. The pairwise contact 
matrix was subdivided into contacts between two regions of high H3K4mel1 
density (H3K4mel bi#h/hish) or low H3K4mel density (H3K4melW¥) or 
between a region with low H3K4mel density and a region with high H3K4me1 
density (H3K4me1°”’"®"), £, From the distribution of H3K4mel high and low 
regions, we calculated an expected distribution of long-range contacts, under 
the null hypothesis that there is no difference between active and inactive 
regions with respect to their long-range contacts. An enrichment score is 
calculated by dividing the observed scores by these expected values. 


Nanog-contacting regions also form preferential contacts among each 
other (Fig. 2c). Among the interchromosomal contacts made by the 
Nanog gene are again a large number of pluripotency related genes: 
Mybl2, Dppa5, Rex1 (also known as Zfp42), Zfp281, Lefty1, Lin28a, 
Esrrb, KIf5, Salll, Cbx5 and Cbx7 (Fig. 2d and Supplementary Fig. 12; 
see Supplementary Table 2 for a full list of contacted regions). The 
contacts of Nanog with Esrrb and Zfp281 were verified by 3D-DNA 
FISH (Supplementary Fig. 12). GREAT analysis’* of these interchro- 
mosomal contacts reveals strong enrichment for genes involved in 
pluripotency and early embryogenesis, which is not observed for unre- 
lated viewpoints or in other tissues (Supplementary Table 3). This 
suggests that pluripotency genes prefer to cluster with other pluripo- 
tency-specific genes. 

We designed a computational strategy, paired-end spatial chro- 
matin analysis (PE-SCAn) (Fig. 2e), which combines chromatin factor 
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Figure 2 | Expressed Nanog gene shows preferential interaction with other 
pluripotency genes. a, Chromosomal map of 4C signal for the Nanog gene in 
ES cells and NPCs. Representative 4C data (n = 6 (ES cell) and n = 2 (NPC) 
biological replicates) is normalized to reads per million and plotted as a running 
mean with a window of 31. Bottom panel shows the ES cell to NPC ratio. Red, 
purple and blue rectangles denote the windows with a high density of Nanog-, 
Sox2- and Oct4-binding sites, respectively. High-density is defined as >5 sites 
per 100 kb. b, Violin plots show quantification of ES cell/NPC ratios for regions 
with a high density of binding sites for Nanog, Sox2, Oct4 and all three 
combined. c, Combined Hi-C-4C plot for the telomeric region of chromosome 
6, shows a normalized Hi-C contact matrix (see Methods) with the 4C data for 
Nanog superimposed. Red, purple and blue rectangles show the high-density 
regions, as in a. Green arrowheads point to Nanog high-density (HD) Hi-C 
interactions other than with the Nanog enhancer. d, Examples of 
interchromosomal contacts made by Nanog with pluripotency genes Esrrb’’, 
Salll (ref. 28) and Kif5 (ref. 29). See methods for the definition of the 4C 
enrichment score. Nanog-, Sox2- and Oct4-binding sites are again highlighted 
with red, purple and blue rectangles. e, Schematic depiction of paired-end 
spatial chromatin analysis (PE-SCAn). Hi-C di-tags are sequentially aligned to 
ChIP-seq binding sites. From the total set of distances dx and dy, a normalized 
two-dimensional frequency matrix is calculated (see Methods). Hi-C pairs 
within 5 Mb were excluded to focus the analysis on contacts between, rather 
than within topological domains. f, PE-SCAn plots show the alignment of 
intrachromosomal Hi-C data to high-density clusters of pluripotency factors 
(25 sites in 50 kb, Nanog: n = 423, Sox2: n = 607, Oct4: n = 1025). Top row 
shows alignment of ES cell Hi-C data, bottom row shows alignment of cortex 
Hi-C data. 


binding data and Hi-C data to analyse, on a global scale, whether given 
genomic sites (bound by a protein of interest) in different topological 
domains have a preference to interact among each other. PE-SCAn 
shows that individual Nanog-, Sox2- or Oct4-binding sites have little 
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preference to contact each other over such large chromosomal 
distances (Supplementary Fig. 13a). However, clusters of Nanog-, 
Oct4- or Sox- binding sites (5 or more per 50 kilobases (kb)) do show 
a strong preference to interact with each other in ES cells (Fig. 2f). 
When we circularly permute the positions of the Nanog, Sox2 or Oct4 
clusters, this preference is not observed, confirming that these inter- 
actions are specific (data not shown). Moreover, these contacts are 
tissue-specific as they are absent in the cortex (Fig. 2f). 

We also used PE-SCAn to investigate the contribution of other 
factors to the shape of the pluripotent genome. Although CTCF and 
cohesin have both been implicated in higher-order chromatin fold- 
ing’, CTCF has been suggested to predominantly form chromatin 
loops over shorter distances'*"*. Indeed, we find that CTCF, but also 
cohesin-binding sites, contribute little to chromosome folding over 
larger distances (Fig. 3a and Supplementary Fig. 13b). Recent chro- 
mosome architecture experiments have revealed a central role for 
promoters in chromosome topology'®. PE-SCAn for histone H3 
trimethyl Lys 4 (H3K4me3) confirmed that active transcriptional start 
sites are engaged in specific long-range contacts (Fig. 3b). However, 
their contribution is largely tissue-invariant, because promoters 
marked by H3K4me3 in either ES cells or cortex also find each other 
equally well in the corresponding tissue (Fig. 3b). This is different 
for active enhancer sites (H3K27ac’’), which contribute to genome 
topology in a more tissue-restricted manner (Fig. 3b). Pluripotency 
factors, but also cohesin, often bind to enhancer sequences. For Oct4, 
Sox2, Nanog and cohesin we find that 41%, 38%, 35% and 27%, 
respectively, of binding sites overlap with active enhancer sites. All 
intersected enhancer sites show an equal preference for homotypic 
contacts as the unselected enhancers (Fig. 3c). Importantly, the 
preferred contacts among Nanog enhancers were not dependent on 
cohesin (Fig. 3d). Finally, we assessed chromosomal contacts among, 
respectively, enhancer and cohesin clusters (5 or more per 50 kb). We 
found that they have no advantage over isolated sites to interact with 
each other, and that their contact preference is not as pronounced as 
seen for clusters of pluripotency-factor-binding sites (Supplementary 
Fig. 13b). 

To investigate whether this pluripotency-specific genome con- 
figuration is dependent on pluripotency factors, we used ZHBTc4 
(ref. 18) and RCNBH (ref. 19) ES cell lines, which allow the acute 
depletion of Oct4 and Nanog protein, respectively (Fig. 4a, b and 
Supplementary Fig. 14a, b). After Oct4 or Nanog protein removal, 
the overall chromosome topology is largely unaffected (Fig. 4c and 
Supplementary Fig. 14c). However, a close comparison between 
factor-depleted and wild-type contact profiles reveals a decrease in 
contact frequencies specifically at clusters where pluripotency factors 
normally bind (Fig. 4d, e). Quantification confirms that the regions 
with reduced contact frequency after removal of Oct4 or Nanog 
protein are those with a high density of cognate binding sites and 
not, for example, regions with a high density of CTCF-binding sites 
(Fig. 4f, see Methods for details). Of note, partial loss of Nanog by short 
interfering RNA (siRNA)-mediated knockdown (78%) has no effect 
(Supplementary Fig. 15), indicating that full knockout of Nanog 
protein is required to affect chromosome topology in ES cells. 

To test further whether pluripotency factor binding has a direct role 
in this pluripotent-stem-cell-specific genome configuration, we made 
use of a C56Bl/6-129S1/SvImJ ES cell line with a 256% lacO repeat 
cassette integrated into the C56BI/6 Nfix allele on chromosome 8 
(Fig. 4g and Supplementary Fig. 16). We targeted green fluorescent 
protein (GFP)-LacR-Nanog fusion proteins to these lacO repeats and 
performed allele-specific 4C (ref. 20) to simultaneously analyse the 
contact spectra of the targeted C56B1/6 and the non-targeted 129S1/ 
SvInjJ allele. Again, the overall chromosome topology for both alleles 
was highly similar, but several new specific contacts were found for the 
C56B1/6 allele. Notably, these contacts coincide with high-density 
Nanog-binding sites around the pluripotency genes Salll and KIf2 
and the Irx cluster of developmental regulators. Circular permutation 
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Figure 3 | Spatial interactome of chromatin factors is revealed by PE-SCAn. 
a-d, PE-SCAn plots for various chromatin factors. Plots are the same as in 
Fig. 2e, but with a different scale on the vertical axis. Note that the height of the 
data is colour-coded according to the colour bar shown in a. Top row represents 
mouse ES cell Hi-C data, bottom row represents cortex Hi-C data. a, PE-SCAn 
plots for known looping factors CTCF and cohesin (Smc1). b, PE-SCAn plots 
for promoter (H3K4me3) and active enhancer (H3K27ac) marks in mouse ES 
cells and cortex. c, PE-SCAn plots for active enhancer sites co-bound by cohesin 
(Smcl1), Nanog, Sox2 or Oct4. d, PE-SCAn plots for genomic sites with active 
enhancer marks and Nanog binding, but which are devoid of cohesin. Left, 
mouse ES cell Hi-C data; right, cortex Hi-C data. 


of the positions of the high-density Nanog clusters showed that 
increased contact frequency was significantly enriched at these sites 
(P< 0.001), demonstrating that Nanog has a direct role in bringing 
together distantly located clusters of Nanog-binding sites. 

Our data show that pluripotency transcription factors shape the 
pluripotent genome via spatial intra- and interchromosomal gathering 
of high-density binding sites. It has been suggested previously that 
transcription factors position tissue-specific and co-regulated genes 
in somatic cells*’*’. However, in contrast to previous studies, we 
validated this concept by comparing genome-wide contact maps 
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Figure 4 | Pluripotency factors influence the 3D organization of the 
genome. a, Immunoblot analysis before and after treatment of ZHBTc4 cells 
with 1 pg ml~* doxycycline for the indicated times. Oct4 and Nanog proteins 
were detected using anti-Oct4 and anti-Nanog antibodies. UT, untreated. 

b, Morphology and GFP expression of RCNBH cells before and 72h after 
treatment with 1 4M tamoxifen (4-OHT, 4-hydroxytamoxifen). c, 4C 
domainograms for Oct4-positive (—dox) and Oct4-negative (+dox) ZHBTc4 
cells (n = 2 biological replicates) show that overall chromosome topology is 
maintained in Oct4-depleted cells. d, Zoomed-in regions show 4C signal (reads 
per million) for Oct4-positive (top) and Oct4-negative (middle) cells. Bottom, the 
difference (A) between the 4C signal of the Oct4-positive and -negative cells. Red, 
purple and blue rectangles show high-density Nanog, Sox2 and Oct4 regions, 
respectively. e, Same as d but for Nanog conditional knockout (cKO) (n = 1). 
Note that gene information is left out at this scale. f, Chromosome-wide analysis 
of differential 4C interactions for the Nanog enhancer viewpoint. Loss of 4C 
contact frequency is defined as a lower 4C signal in the knockout compared to the 
non-depleted reference cell line. Loss of contact frequency is determined for high- 
density Oct4, Nanog, Sox2 and CTCF (control) clusters for the Oct4-ablated and 
Nanog-conditional knockout cell lines and the enrichment over the background 
is calculated (see Methods for details). g, Schematic drawing depicting the 
integration site of the lacO repeat cassette in the C57BI/6 allele of the Nfix locus 
and the targeting of GFP -LacR-Nanog fusion proteins. h, Domainograms 
showing allele-specific 4C (ref. 20) for the C56BI/6 (containing the lacO cassette) 
and the 129 allele present in the hybrid ES cells (n = 1). Bottom panels show 
zoomed-in 4C profiles (C57BI/6 green, 129 blue) for example differential regions. 
Red rectangles indicate high-density Nanog clusters (6 sites per 100 kb). 


generated in wild-type and transcription factor knockout cells and by 
studying an artificially induced cluster of binding sites. Our obser- 
vation that targeting or removing a given factor to or from the genome 
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only changes specific contacts while the overall folding of chromo- 
somes remains intact is in accordance with a recently proposed model 
for chromosome topology. This “dog-on-a-lead’ model predicts that 
chromosomes are dominant over their individual segments (genes, 
domains, enhancers) in dictating the overall shape of the genome, 
but that segments can search the nuclear subvolumes they occupy 
for preferred contact partners™*. There is accumulating evidence that 
stochastically determined nuclear environments can influence the 
transcriptional output of resident genes, leading to cell-to-cell variabil- 
ity>°. We propose that the observed spatial clustering of pluripotency 
factor binding sites in pluripotent stem cells enhances the transcription 
efficiency of nearby genes and thereby contributes to the robustness of 
the pluripotent state. 


METHODS SUMMARY 


4C sequencing and mapping. 4C sequencing was performed as previously 
described’. We used HindIII as the first restriction enzyme to generate the 3C 
template, which was further trimmed with DpnII. Sequencing was performed on 
Illumina GAII and HiSeq 2000 over multiple runs. Raw sequencing data and mapped 
wig files can be found under Gene Expression Omnibus (GEO) accession GSE37275. 
PE-SCAn. To assess which factors are associated with genome organization, we 
aligned ChIP data to the Hi-C data. For this, intrachromosomal captures that are 
>5 Mb from each other are aligned to transcription-factor -binding sites. Only 
captures where both di-tags mapped within 500 kb of a ChIP peak were considered 
in the analysis. As a result we get for every pair of ChIP peaks on the same 
chromosome a set of two distances (dx, dy), to all the Hi-C di-tags that are found 
within 500 kb of these peaks. From the distribution of dx and dy a frequency matrix 
is calculated with a bin size of 50kb, which is normalized by dividing by a 
randomized data set that is calculated by aligning the Hi-C data to a circularly 
permuted ChIP-seq data set, that is, the ChIP peaks are linearly shifted 10 Mb 
along the chromosome. In this manner the structure of the Hi-C data is preserved; 
the structure of the ChIP data is also preserved, only shifted. 

Depletion of pluripotency factors. RCNBH cells were treated with tamoxifen and 
replated the next day. Seventy-two hours after initial tamoxifen treatment, cells 
were collected for 4C template preparation and analyses. ZHBTc4 cells were 
collected after 48 h of treatment with 1 jigml~' doxycycline. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Cell culture. E14 ES cells (129/Ola background) and C56BI/6-129 ES cells were 
grown in BRL-conditioned DMEM (high glucose, Gibco) supplemented with 15% 
FBS, 1X non-essential amino acids (NEAA; Gibco), 1X penicillin-streptomycin 
(Gibco), 1:1,000 B-mercaptoethanol (Invitrogen), 1% L-glutamine (Gibco) and 
1,000 Uml”’ leukaemia inhibitory factor (LIF; Gibco). Independently derived 
129/Cast ES cells (129SVJ/Castaneus background) were grown on irradiated 
mouse embryonic fibroblasts in DMEM supplemented with 15% FBS, 1x NEAA, 
1X penicillin-streptomycin, 1:1,000 B-mercaptoethanol and 1,000U ml! LIF. 
RCNBH cells were cultured in GMEM, f-mercaptoethanol, 10% FCS and LIF 
as described previously'*°. ZHBTc4 (ref. 18) cells were cultured in GMEM, 
B-mercaptoethanol, 15% FCS, sodium bicarbonate and LIF. Culture medium was 
supplemented with 1 pgml~' doxycycline or 1 [lM tetracycline when applicable. 
NPCs (E14 and 129/Cast) were grown in DMEM/F12 supplemented with 1:100 N2 
(Gibco), 20 ng ml | bEGF (Peprotech), 20 ng ml | murine EGE (Peprotech). For 
the 129/Cast NP cells 1X B-27 (Gibco) was added’. We generated astrocytes by 
growing E14NP cells to confluency and washing twice with DMEM before adding 
astrocyte medium (DMEM/F12 supplemented with 1:100 N2 and 2% FBS)*'. The 
culture medium was changed twice and cells were grown for 5 days to make sure 
differentiation was complete, which was confirmed by immunofluorescence. 
Generation of iPS cells. For generation of iPS cells, 10,000 129/Cast NPCs were 
seeded on gelatin-coated dishes in N2B27 medium (StemCell Resources). Cells 
were infected overnight with lentivirus expressing a multicistronic reprogramming 
cassette, encoding the iPS factors Oct4, KIf4, Sox2 and c-Myc”. After 5 days, cells 
were collected and plated on irradiated mouse embryonic fibroblasts. On day 6, 
N2B27 medium was replaced with mouse ES cell medium (DMEM with 15% 
FBS, 1X NEAA, 1X penicillin-streptomycin, 1:1,000 B-mercaptoethanol and 
1,000 Um! LIF). iPS cell colonies were picked for clonal expansion on days 
20-22 after infection. At passage 11 after colony picking, proper iPS cell repro- 
gramming was examined by qPCR analysis on a panel of marker genes on total 
RNA (pluripotency markers: Nanog, Zic3, Dppa4, Sall4, Cer1, Sox17 and Fef5, 
neuronal lineage markers: Olig2, Nestin, Blbp and Glast). Cells were collected for 
4C at passage 11. 

siRNA knockdown of Nanog. For our knockdown experiments we used a pool of 
siRNA oligonucleotides targeting Nanog (M-057004-01) and a control pool con- 
taining non-targeting siRNAs (D-001206-13, siGENOME SMARTpool, Dharmacon). 
129/Cast ES cells were seeded without feeders in 100-mm culture dishes at ~20% 
confluency on the day before transfection. Cells were transfected according to 
the manufacturer’s protocol using 25nM final siRNA concentration combined 
with 50 ul DharmaFECT 1. Transfection mixtures were added directly into the 
culture medium and plates were incubated at 37°C with 5% CO. Forty-eight 
hours after transfection, cells were collected for protein level analysis and 4C 
template preparation. 

Conditional ablation of Nanog and Oct4. RCNBH cells were treated with 
tamoxifen and replated the next day. Seventy-two hours after initial tamoxifen 
treatment, cells were collected for 4C template preparation and analyses. ZHBTc4 
cells were collected after 48h of treatment with 1 pg ml”! doxycycline. 

Protein analysis. Protein levels before and after conditional deletion were 
analysed in cells collected at the time points as described above. Immunoblot 
analysis was carried out on nuclear extracts that were made as described in’. 
Extracts were subjected to SDS-PAGE™, and proteins were transferred to a 
methanol-activated PVDF membrane. Blots were blocked in blocking buffer (5% 
non-fat dry milk in TBST (50 mM Tris, pH 7.4, 150 mM NaCl, 0.1%Tween)) for 1h 
at room temperature or overnight at 4° C, while tumbling. Primary antibody was 
diluted in blocking buffer and incubated for 1-3 h at room temperature or overnight 
at 4° C, while tumbling. Blots were washed four times with TBST and incubated 
with secondary antibody for 1h in blocking buffer. Membranes were then incu- 
bated with SuperSignal West Pro (Thermo Scientific) and digitally analysed using 
an LAS 4000 ECL ImageQuant imager and ImageJ software. Used antibodies: anti- 
Nanog (A300-397A, Bethyl Laboratories) at 1:5,000, anti-Oct4 (C30A3, Cell 
Signaling Technology) at 1:1,000, anti-histone H3 (Abcam 1791) at 1:2,000. 

Flow cytometry. Tamoxifen-treated and -untreated RCNBH cells were trypsinized 
and pellets were resuspended as single cells in regular ES cell medium at about 10° 
per ml. For each condition, 50,000 live cells were analysed for GFP fluorescence, 
using a Becton Dickinson FACSCalibur flow cytometer and FloJo software. 
Generation of lacO targeted cell line. Homology arms were excised (KpnI digest) 
from bacterial artificial chromosome (BAC) RP24-136A15, and ligated into a low- 
copy bluescript plasmid. A total of 256 copies of a lacO array were inserted into a 
unique AatII site of the homology arms. F; ES cells derived from C56BI/6 and 129 
mouse strains were transfected with the linearized targeting construct by electro- 
poration. After 14 days of selection with neomycine, positive colonies were picked 
and screened by Southern blotting. The GFP-LacR-Nanog construct was generated 
in the backbone of pHAGE2-IRES-puro with an EFla promoter’’. LacO cells were 


stably transduced with the GFP-LacR-Nanog construct, and positive cells were 
selected with puromycine for 10 days after which cells were collected and tested for 
purity of by flow cytometry (90% GFP-positive). Allelic paired-end 4C technology 
was performed as described”®, using HindIII-DpnIl digestion and the following 4C 
primers: 5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACA 
CGACGCTCTTCCGATCGGAACTAAATGGAGGATC-3’ and 5'-CAAGCAG 
AAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTC 
CGATCTTACCAGGACCCCTGGGACCC-3’. 

3D-DNA FISH. 3D-DNA FISH for interchromosomal interaction was performed 
essentially as described in ref. 2. For slide preparation, ES cells were spotted on 
polylysine microscopy slides after which slides were washed in PBS. Cells were 
fixed in 3% paraformaldehyde/PBS and washed twice with PBS, after which cells 
were permeabilized on ice using ice-cold 0.5% Triton X-100 for 6 min. Slides were 
then washed for 3 min with 70% ethanol and stored in 70% ethanol at —20 °C. For 
preparation of probes, 10 pl of both labelled BACs was combined with 5 11 mouse 
Cotl DNA and mixtures were speedvacced until pellets were dry. Pellets were 
resuspended in 12.5 pl 50+ hybmix, incubated for 5 min at 95 °C, cooled on ice, 
and incubated for 30 min at 37 °C. 

For FISH hybridization, slides were dehydrated for 3 min in 70% ethanol, 3 min 
in 90% ethanol, 3 min in 100% ethanol, after which slides were air-dried. One- 
hundred microliters of 70+ hybmix was then added to the dried slides, and slides 
were covered with a coverslip and incubated for 3 min at 85 °C. Slides were washed 
on ice, using ice-cold 2X saline-sodium citrate buffer (SSC) for 5 min, then using 
ice cold 70% ethanol for 5 min, after which slides were dehydrated again as 
described above. After air drying, 10 ul probe was added and covered with a 
coverslip and hybridizing slides were incubated overnight at 37°C in a humid 
box containing 50% formamide/2 SSC. After hybridization, slides were washed 
in 2X SSC for 5 min, which also removes the coverslip. Subsequently, slides were 
washed three times for 10 min in 50% formamide/2X SSC at 37 °C. Slides were 
then dehydrated as described above, and air-dried slides were mounted using 40 ll 
DAPI/Vectashield. Slides were covered with new coverslips and sealed with trans- 
parent nail polish. We performed manual distance measurements in Image] using 
the Image5D plugin. 

General 4C template information. For high-quality 4C experiments library com- 
plexity is crucial; by applying 4C to 1 million genome equivalents (3 ug DNA), we 
analyse a large number of ligation products per viewpoint. The generated DNA 
contact profiles are therefore a true population average*. The observed ligation 
products are the result of spatial proximity. Note that these ligation products can 
bea reflection of direct DNA contacts (such as promoter-enhancer interactions) or 
indirect contacts mediated by large macromolecular complexes or nuclear particles. 

Experimental and primer design is done as previously described’. For the allele- 
specific 4C we have used a paired-end 4C strategy”. To this end, we designed 
forward and reverse primers compatible with the Illumina flow cell. The forward 
primer analyses the ligation product and the reverse primer is selected such that it 
sequences an SNP that distinguishes the C57BI/6 allele from the 129S1/SvImJ 
allele. After sequencing, this SNP is used to demultiplex the two alleles, to create 
two separate 4C profiles. 

Definitions. To make this methods section clearer to non-experts we present the 
following definitions. Fragment: a genomic region (or sequence) that is generated 
after the first restriction. In this case, the first restriction enzyme, that generates the 
3C template, is always HindIII. Fragment end: to generate the 4C template, the 3C 
template is further digested with a frequent cutter, in our case DpnII. The resulting 
HindIII-Dpnll restriction fragment is referred to as the fragment end, because this 
restriction fragment represents the end of the 3C fragment. Capture frequency: 
captures are defined as the ligations in the 3C protocol resulting from 3D genome 
conformation. The 4C primers directly interrogate the ligation junction. Therefore 
the resulting capture frequency can be estimated from counting the number of 
reads coming from a given fragment end. 

4C sequencing and mapping. 4C sequencing was performed as previously 
described’. We used HindIII as the first restriction enzyme to generate the 3C 
template, which was further trimmed with DpnII. Sequencing was performed on 
Illumina GAII and HiSeq 2000 over multiple runs. Primer sequence (internal 
barcode) is removed from the sequence and the trimmed reads were aligned to a 
reduced genome consisting of sequences that flank HindIII restriction sites. The 
mouse mm9 genome was used as the reference genome for mapping 4C sequence 
captures. Non-unique sequences (repeats) that flank a restriction site were removed 
from the analysis. From the mapping a frequency distribution along the genome is 
calculated, which is the input for all downstream analyses. Raw sequencing data and 
mapped wig files can be found under GEO accession GSE37275. 

Statistical analysis of 4C data. Statistical analyses of 4C data (that is, domaino- 
grams and target identification) was performed as described previously”'®. For 
formal definitions we refer the reader to these articles. Here we will briefly describe 
underlying principles of the data analysis. An inherent challenge of 4C data (and 
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genome-wide chromosome capture data in general) is the highly non-uniform 
data distribution. Close to the viewpoint the signal is very high, whereas the signal 
rapidly decreases as a function of the distance from the viewpoint. Therefore, we 
statistically define significant interactions as regions that have an increased number 
of captures compared to the local background. To this end we must estimate the local 
background capture frequency. To minimize potential PCR artefacts we transform 
the 4C-seq read count at HindIII-Dpnll fragment ends to binary data (that is, 
captured or not captured). From this it is clear that 0s play an essential role in 
determining significant interactions. Local background is then determined as the 
frequency of captured fragment ends (1 s) ina large window, typically 3,000 fragment 
ends. Following the binomial distribution, we can estimate ji and o (for details see 
ref. 10, which is used to determine a z-score for a window of fragment ends of fixed 
size. 

To visualize the 4C data using domainograms, z-scores are calculated using 
windows with a range of size (from 3 to 200). The z-scores are subsequently 
transformed to P values with a one-tailed normal test. The —log;o-tranformed 
Pvalues are colour-coded and visualized along the linear chromosome. As such 
regions can be visualized with a high likelihood of interaction with the viewpoint. 

To distill discrete regions of significant interaction we choose a fixed window 

size of 100 fragment ends and calculate the z-scores for this window size across the 
chromosome. To select significant regions we determine the z-score threshold 
based on a FDR level of 0.01. The FDR is determined based on the z-score 
distribution in 100 randomly permuted chromosomes. The windows exceeding 
the z-score threshold are selected as significantly contacted regions. 
Analysis of 4C trans-interactions. In our data set we find highly specific inter- 
chromosomal interactions. Like the intrachromosomal profiles we calculate an 
enrichment score over the background capture frequency. However, because the 
background capture frequency is distributed more or less uniformly across the 
chromosome, we can use a single background frequency per chromosome. The 4C 
enrichment score is calculated in the following way: 


W'Pw,i —(Pchrom*W) 
\/ W'Pchrom “( — Pchrom) 


in which w is the window size, and i is window index along the chromosome. 
Pchrom is defined as follows: 
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in which Neapturea is the number of fragment ends captured on the chromosome 
and Nchrom is the total number fragment ends on the chromosome. p,,; is defined 
as follows: 


Ny,i,captured 

Pwi= rel (3) 
in which n,, ; captured is the number of fragments captured in genomic window i of 
size w. 

Windows with an Ey,ans,: larger than 6 were chosen for subsequent analysis in 
GREAT. 
4C/Hi-C alignment to ChIP profiles. To test enrichment of 4C signal along ChIP 
peaks we aligned the trans fragments to nearest ChIP peak. We used several ES cell 
ChIP-seq profiles from various sources. For Oct4, Sox2 and Nanog we used the 
data described in ref. 36 (GSE11724). H3K27ac was taken from ref. 37 
(GSE24165). Smcl data** (GSE22557), H3K4me3 (ref. 39) (GSE12241), RNA 
Polll and CTCF (mES cell and cortex), H3K27ac and H3K4me3 (cortex)! 
(GSE29218). 

4C data was binarized because in trans the capture frequency is so low that read 
count more likely represent differences in PCR efficiency rather than genuine 
unique captures. For the binarized data the distance to the nearest ChIP peak 
was calculated. To calculate enrichment scores, the distances to the nearest ChIP 
peak were sorted (that is, aligned) and a sliding average was calculated. The 
window size of the sliding average was set to 1% of the total data set. 
Hi-C normalization and analysis. Hi-C data’ was downloaded from GEO (acces- 
sion GSE35156). We removed all read pairs that are mapped within 500 bp of each 
other on the chromosome, because these read pairs are probably genomic back- 
ground sequence, rather than bona fide Hi-C captures. For the virtual 4C and 
disorganization analysis we average the data to bins of 100 kb, which results in a 
matrix of pairwise capture frequencies between all the genomic bins. A proper 
analysis of the Hi-C data requires that we correct the Hi-C matrix for genomic 
biases. For this normalization we assume that the capture probability in a given 
genomic bin is dependent on the number of restriction sites in this bin. The strong 
positive correlation between the restriction site density and the number of captures 
for that given bin is evidence that this assumption is correct (data not shown). We 
therefore normalize the bins by dividing by the capture probability. First we 


LETTER 


calculate the restriction density in 100-kb bins along the chromosome, which gives 
us a capture probability for in a given bin (Pcapturei). The capture probability 
between two bins on the chromosome (Peapture,,j) €an now be calculated by taken 
the product of the capture probabilities of the two single bins (Peapture,i> Peapture,): 
Before normalization the correlation between the diagonals of the the Hi-C mat- 
rices for the Ncol and HindIII experiments from mouse ES cells is 0.32. However, 
after normalization this correlation has jumped to 0.86. 

For the virtual 4C based on the Hi-C data, we combine the three normalized 
matrices (2% HindIII, 1X NcoI). Because the data are too sparse to perform a 
virtual 4C analysis for a single fragment, we analyse a single row from the Hi-C 
interaction matrix. For comparison, we also calculate the average 4C signal in 
100-kb genomic bins. 

For the analysis of genomic disorganization we use the two HindIII experiments 
for mouse ES cell and cortex (GSE35156). In this analysis we want to compare the 
propensity of active and inactive regions to contact regions over large genomic 
distances. To this end we segment the chromosomes in active and inactive bins of 
100 kb, based on the density of H3K4mel sites”. On the basis of this segmentation 
we can create a matrix with similar in size to the Hi-C matrix. In this matrix three 
classes of interaction bins can be created: (1) H3K4mel1 high in both: interaction 
bin between two active genomic regions; (2) H3K4mel low in both: interaction bin 
between two inactive genomic regions; and (3) H3K4mel low/H3K4mel high: 
interaction bin between and active and inactive. 

Because we perform a 50/50 segmentation, the classes H3K4mel and 
H3K4me1!°”"© will both be 25% of the interaction bins, H3K4mel1#"”>" class 
will be 50% of the interaction bins. In addition, the Hi-C matrix is segmented into 
high-contact and low-contact bins by setting an arbitrary threshold (75% quantile 
value of the entire matrix). Next, we overlay the segmented Hi-C matrix and the 
contact bins to determine the number of long-range contacts made for each of the 
classes. We use various minimal distance cut-offs running from 10-70 Mb with 
step sizes of 10 Mb. This process is schematically explained in Fig. le. 
Alignment of Hi-C data to ChIP peaks (PE-SCAn). To assess which factors are 
associated with genome organization, we aligned ChIP data to the Hi-C data. To 
this end we selected the intrachromosomal captures, however, because of the 
strongly non-uniform distribution we removed the captures that lie within 5 Mb 
of each other. This has the effect that we only analyse interactions between, rather 
than within, topological domains. The Hi-C pairs were aligned to the ChIP data in 
two iterations. First, one end of the paired reads was aligned to the ChIP data. Only 
reads that mapped within 500 kb up- or downstream of the ChIP peaks were 
selected for further analysis. Of this reduced set the corresponding read was also 
aligned to the ChIP peaks within 500 kb. As a result we get for every intrachro- 
mosomal pair of ChIP peak a set of two distances (dx, dy), to all the Hi-C ditags 
that are found within 500 kb of these peaks. From the distribution of dx and dy a 
frequency matrix is calculated, which is the result of our two-dimensional align- 
ment, with a bin size of 50 kb. To calculate whether the binding sites of a given 
factor show preferential spatial contacts, we calculate an enrichment score over a 
randomized data set. The randomized data set is calculated by aligning the Hi-C 
data to a circularly permuted ChIPseq data set, that is, the ChIP peaks are linearly 
shifted 10 Mb along the chromosome. It is important to note that in this manner 
the structure of the Hi-C data are preserved; the structure of the ChIP data are also 
preserved, only shifted. The resulting frequency matrix serves as an internal nor- 
malization for the observed Hi-C data alignment scores. 
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Mechanism of MEK inhibition determines efficacy in 
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KRAS and BRAF activating mutations drive tumorigenesis through 
constitutive activation of the MAPK pathway. As these tumours repre- 
sent an area of high unmet medical need, multiple allosteric MEK 
inhibitors, which inhibit MAPK signalling in both genotypes, are 
being tested in clinical trials. Impressive single-agent activity in 
BRAF-mutant melanoma has been observed; however, efficacy has 
been far less robust in KRAS-mutant disease’. Here we show that, 
owing to distinct mechanisms regulating MEK activation in KRAS- 
versus BRAF-driven tumours””, different mechanisms of inhibition 
are required for optimal antitumour activity in each genotype. 
Structural and functional analysis illustrates that MEK inhibitors 
with superior efficacy in KRAS-driven tumours (GDC-0623 and 
G-573, the former currently in phase I clinical trials) form a strong 
hydrogen-bond interaction with $212 in MEK that is critical for 
blocking MEK feedback phosphorylation by wild-type RAF. 
Conversely, potent inhibition of active, phosphorylated MEK is 
required for strong inhibition of the MAPK pathway in BRAF- 
mutant tumours, resulting in superior efficacy in this genotype with 
GDC-0973 (also known as cobimetinib), a MEK inhibitor currently 
in phase III clinical trials. Our study highlights that differences in 
the activation state of MEK in KRAS-mutant tumours versus BRAF- 
mutant tumours can be exploited through the design of inhibitors 
that uniquely target these distinct activation states of MEK. These 
inhibitors are currently being evaluated in clinical trials to deter- 
mine whether improvements in therapeutic index within KRAS 
versus BRAF preclinical models translate to improved clinical res- 
ponses in patients. 

The MAPK/ERK kinase (MEK) signalling cascade is a key regulator 
of cellular proliferation, differentiation and survival downstream of 
RAS activation*. Upregulation of this pathway occurs in a large frac- 
tion of tumours, frequently owing to oncogenic activating mutations 
in KRAS, NRAS, HRAS and BRAF’. Whereas BRAF inhibitors have 
shown remarkable efficacy against melanomas with BRAF(V600E) muta- 
tions, these compounds are not effective against KRAS-mutant tumours 
owing to inhibitor-mediated priming of wild-type RAF signalling®*. 
As MEK is a common effector downstream of wild-type and mutant 
RAF®’, MEK inhibitors have the potential to target all tumours depen- 
dent on MAPK pathway signalling. Hence, MEK inhibitors show efficacy 
against both BRAF- and KRAS-mutant cancer cell lines and xenograft 
models’’. Although the sensitivity of BRAF-mutant models is typically 
greater than that of KRAS-mutant models”"’, the relative potency 
between the two genotypes varies for different inhibitors'*'*. This 
difference is evident when comparing three allosteric MEK inhibitors 
in tumour models that have mutations in either BRAF or KRAS (Table 1 
and Fig. 1). All three inhibitors are potent, ATP-uncompetitive inhibi- 
tors of MEK] but show distinct shifts in cellular activity, with GDC-0973 


having 100-fold weaker potency against a KRAS-mutant versus BRAF- 
mutant cell line, whereas GDC-0623 and G-573 show only 6- and 
15-fold half-maximum effective concentration (ECs) decreases, respec- 
tively (Table 1 and Supplementary Fig. 1). We profiled the MEK inhi- 
bitors across a panel of BRAF and KRAS-mutant cancer cell lines (Fig. la 
and Supplementary Table 1) and consistently found that GDC-0973 was 
less potent in KRAS versus BRAF-mutant cell lines, whereas GDC-0623 
and G-573 had similar efficacy in the two genotypes. We tested clinically 
relevant doses of GDC-0973 and GDC-0623: the average plasma con- 
centration of GDC-0973 over 24 h at its maximum tolerated dose (MTD) 
is 0.37 uM and for GDC-0623 is 0.63 uM (Musib et al., manuscripts in 
preparation). 

To test whether these differences translate to in vivo tumour models, 
we compared GDC-0973, GDC-0623 and G-573 in three KRAS-mutant 
xenograft models (Fig. 1b and Supplementary Fig. 2). Compounds were 
evaluated at their MTDs, and superior antitumour activity was observed 
with GDC-0623 and G-573 compared to GDC-0973 in all three KRAS 
models. 

Treatment of KRAS-mutant cells with MEK inhibitors is expected to 
lead to an increase in phosphorylated MEK (pMEK) through feedback- 
mediated RAF activation’®'®. We observed the expected increase in 
pMEK in response to GDC-0973 (Fig. 2a, top panel), but not GDC- 
0623 or G-573. This block of pMEK accumulation resulted in more 
effective inhibition of pERK by GDC-0623 and G-573. Release of 
negative feedback mediators occurred normally in response to all inhi- 
bitors (Supplementary Fig. 3). Ina BRAF(V600E) cell line, where wild- 
type RAF activity is low'®, all inhibitors reduced pMEK levels (Fig. 2a 
lower panel)'*. Knockdown experiments revealed that BRAF and 
CRAF, but not ARAF, were responsible for pMEK induction (Sup- 
plementary Fig. 4), in agreement with BRAF-CRAF heterodimers 
being the major mediators of signalling through activated KRAS*. 

Given that GDC-0623 and G-573 prevent MEK phosphorylation in 
cells, we tested whether these compounds directly inhibit RAF activity 
in vitro. Only GDC-0623 and G-573 were able to prevent MEK phos- 
phorylation by CRAF in vitro (Fig. 2b, upper panel), whereas all three 
inhibitors were able to block MEK phosphorylation by BRAF(V600E) 
(Fig. 2b, lower panel). The inhibitors did not block phosphorylation of 
MEK(K97R) (Supplementary Fig. 5a), which is kinase dead and does 
not effectively bind the inhibitors (Supplementary Figs 5b and 10). 
These studies confirm that the MEK inhibitors do not block RAF acti- 
vation directly, but rather through their effects on MEK. Furthermore, 
none of the inhibitors had off-target effects against other kinases (Sup- 
plementary Table 3). 

Because the MEK inhibitors affect RAF activity, we proposed that 
they might affect RAF-MEK protein interactions. In vitro dimerization 
assays revealed that GDC-0623 and G-573 stabilize the RAF-MEK 
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Table 1 | Biochemical and Cellular Potency of GDC-0973, GDC-0623, and G-573. 


Compound MEK1 MEK1 Proliferation ECs5o (uM) KRAS(G13D)/ 
(ATP) K; (nM) (+ATP) Kj (nM) BRAF(V600E) ECso 
A375 (BRAF(V600E)) HCT116 (KRAS(G13D)) 
GDC-0973 262.5 0.05 0.005 0.520 104 
GDC-0623 2175 0.13 0.007 0.042 6 
G-573 196.3 0.30 0.011 0.167 15 


complex whereas GDC-0973 does not (Fig. 2c and Supplementary 
Fig. 6a, b). RAF-MEK complex formation was specific to wild-type 
RAF and not BRAF(V600E), which does not form stable complexes in 
the presence of inhibitors (Fig. 2d and Supplementary Fig. 6a, b). These 
in vitro findings were confirmed in cells. In KRAS-mutant cells GDC- 
0623, and more markedly, G-573, induced dimerization of MEK with 
both BRAF and CRAF, whereas GDC-0973 reduced baseline hetero- 
dimer levels (Fig. 2e), coincident with elevated pMEK levels (Fig. 2f, 
top panel, and Supplementary Fig. 7a, b). Consistent with the biochemi- 
cal data, G-573 did not affect the basal BRAF(V600E)-MEK complex 
in BRAF(V600E) homozygous cells, whereas it was able to effectively 
stabilize the CRAF-MEK complex (Supplementary Fig. 6c, d). 

We theorized that RAF-MEK complex stabilization by GDC-0623 
and G-573 may disfavour BRAF-CRAF complex formation in response 
to MEK inhibitors, and thus dampen feedback-induced signalling. 
Indeed, only GDC-0973 promoted BRAF-CRAF heterodimer forma- 
tion (Fig. 2f, lower panel, and Supplementary Fig. 7b). Furthermore, 
whereas GDC-0973 induced CRAF translocation to the plasma mem- 
brane as previously described’*, G-573 treatment blocked it (Supplemen- 
tary Fig. 7c). Thus, a subset of MEK inhibitors, in addition to blocking 
MEK activation by RAF, also block RAF activation through inhibition 
of BRAF-CRAF heterodimer formation and RAF plasma membrane 
translocation. 
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Figure 1 | Allosteric MEK inhibitors show distinct relative potencies in 
KRAS-mutant and BRAF-mutant cancer models. a, ECs, values (uM) in a 
cell viability assay of indicated MEK inhibitors against a panel of BRAF(V600E) 
and KRAS-mutant cancer cell lines. *EC59 values >1 uM. Red arrows: 
MiaPaCa-2 (left), H2122 (right). b, Antitumour efficacy of GDC-0973 in the 
MiaPaCa-2 and H2122 xenograft models, GDC-0623 in MiaPaCa-2 (n = 5 per 
group) or G-573 in H2122 (n = 10 per group). Percent tumour growth 
inhibition (%TGI) was 78% for GDC-0973 and 120% for GDC-0623 in 
MiaPaCa-2. GDC-0973 had no partial or complete regressions (PR/CR), 
whereas GDC-0623 had 3 PR, 2 CR. In H2122 xenografts, GDC-0973 treatment 
led to 80% TGI, 1 PR, 0 CR; G-573 treatment led to118% TGI, 3 PR, 0 CR. 
Group mean is plotted with error bars representing the standard error of mean 
(s.e.m.). PO QD, by mouth once a day. 


We obtained a crystal structure of GDC-0973 with the MEK] kinase 
domain and compared its binding mode to that of GDC-0623 and G-573 
based on molecular modelling (Fig. 3a and Supplementary Fig. 10). The 
three inhibitors bind in a similar orientation with respect to the nucleo- 
tide and the activation loop (A-loop) helix that contains $218, one of the 
two MEK sites phosphorylated by RAF. $222, the second phosphoryla- 
tion site, lies in a part of the activation loop that is not well defined in the 
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Figure 2 | MEK inhibitor efficacy in KRAS-mutant models correlates with 


the ability to block feedback-induced phosphorylation of MEK by RAF. 

a, Western blot (WB) of phosphorylated MEK (pMEK) and total MEK (tMEK) 
in HCT116 KRAS(G13D) cells (top) and A375 BRAF(V600E) cells (bottom) 
after overnight treatment with inhibitors (range: 0.00045-0.33 1M in threefold 
increments) shows pMEK induction selectively by GDC-0973 in HCT116 cells. 
b, In vitro cascade kinase assays using recombinant inactive MEK and ERK and 
recombinant kinase domains of either CRAF (top) or BRAF(V600E) (bottom) 
in the presence of 0.01, 0.1 and 1 1M of the indicated MEK inhibitors. 

c, d, Recombinant kinase domain (KD) CRAF“?-MEK dimer formation (c) or 
BRAF(V600E)<-MEK dimer formation (d) are detected by homogenous 
time-resolved fluorescence in vitro, either in the absence (No ATP control) or 
presence of 100 p.M ATP with and without MEK inhibitors (GDC-0973, GDC- 
0623, G-573, DMSO). e, Immunoprecipitation (IP)/WB of endogenous MEK1 
from lysates of HCT116 (KRAS(G13D)) cells treated overnight with DMSO or 
indicated MEK inhibitors at 0.1, 1, 10 1M. f, IP/WB of endogenous BRAF from 
lysates of HCT116 (KRAS(G13D)) cells treated overnight with DMSO or 
indicated MEK inhibitors at 0.1, 1, 10 1M (lower panel). Immunoprecipitates 
were immunoblotted for BRAF and CRAF, showing that GDC-0973 promotes 
BRAF-CRAF complex formation. Total lysates were immunoblotted for 
pMEK, pERK and tMEK (upper panel). 
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crystal structure. In our structure/models, the inhibitors form an 
interaction with the backbone amide of S212, at the beginning of 
the A-loop helix’”’*. Whereas GDC-0623 and G-573 are predicted 
to engage $212 with an aromatic nitrogen (N) to form a hydrogen 
bond, GDC-0973 engages $212 with an aromatic fluorine (F) forming 
a much weaker interaction, given the high electronegativity and low 
polarizability of F’’. The importance of this interaction for inhibitor 
activity was highlighted by disrupting it through substitution of a 
carbon at this site, which abolished activity (Supplementary Fig. 8). 
We proposed that the interaction with the $212 backbone might 
constrain the movement of the A-loop helix and prevent MEK phos- 
phorylation by RAF. To test this, we mutated $212 to proline to disrupt 
the hydrogen bond with the inhibitors (Supplementary Fig. 9). Based on 
modelling, $212P also disrupts the hydrogen-bond network between 
the S212 side chain, the A-loop helix, and E114 in the «-C helix, and is 
predicted to allow greater accessibility of the A-loop to RAF (Supplemen- 
tary Fig. 9). In contrast, aS212A mutation should not affect the inhibi- 
tor hydrogen-bond interaction, but would affect the hydrogen-bond 
network between S212, the A-loop helix and E114 and lead to increased 
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Figure 3 | Distinct inhibitor interactions with MEK determine their 
individual effects on RAF-MEK complex formation, MEK activation and 
inhibition of ERK phosphorylation. a, MEK] crystal structure of GDC-0973 
(magenta) with models of GDC-0623 (yellow) and G-573 (green) show key 
hydrogen-bond interactions of inhibitors with $212 on the MEK activation 
segment. $222 is shown in its approximate location based on modelled 
conformations. b, WB of lysates from HCT116 MEK1 null cells transiently 
transfected with wild-type MEK1 (WT), $212P or S212A mutant MEK1 and 
treated with 1 1M of the indicated MEK inhibitors for 24h. c, Upper panel, WB 
of HCT116 (KRAS(G13D)) cells treated for 24h with the indicated MEK 
inhibitors (0.1, 1 1M); lower panel, IP/WB of endogenous MEK1 from lysates 
of HCT116 (KRAS(G13D)) cells treated overnight with DMSO or indicated 
inhibitors. d, Recombinant protein pull-down assays using wild-type or $212P 
MEK after incubation with recombinant CRAF in the presence or absence of 
ATP (100 tM) and the indicated inhibitors. 
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A-loop helix dynamics and higher basal MEK activity, as previously 
reported”. In cellular studies, transfected wild-type MEK1 became 
highly phosphorylated in the presence of GDC-0973 (Fig. 3b). The 
$212P and S212A mutants showed higher basal pMEK levels, consist- 
ent with the disruption of the $212/A-loop/C-helix interactions. As 
predicted, GDC-0623 and G-573 were unable to inhibit the phosphor- 
ylation of MEK1(S212P) but were able to block the phosphorylation of 
MEK1(S212A). Our model suggests this is because of the hydrogen 
bonds to the alanine backbone that are lost with the proline backbone 
(Fig. 3b and Supplementary Fig. 9c). We propose that the stronger inter- 
action of GDC-0623 and G-573 with $212 reduces the flexibility of the 
MEK A-loop helix, thus preventing RAF from accessing $218/S222. 

Weevaluated additional compounds with an N (G-530) or F (PD-901) 
at this position (Supplementary Fig. 10). As predicted by our model, 
G-530 prevented pMEK accumulation and stabilized a RAF-MEK 
complex, whereas PD-901 led to increased pMEK and lower levels of 
RAF-MEK dimers (Fig. 3c). In biochemical studies, RAF-MEK com- 
plexes stabilized by G-573 and G-530 were lost upon introduction of a 
$212P mutation in MEK (Fig. 3d). Additionally, the relative potencies 
of PD-901 and G-530 in KRAS versus BRAF-mutant cells track with 
those of GDC-0973 and GDC-0623/G-573, respectively (Supplemen- 
tary Fig. 11). These data support the conclusion that a strong hydrogen 
bond between MEK inhibitors and $212 is important for stabilizing a 
RAF-MEK complex and blocking RAF-mediated MEK phosphoryla- 
tion in KRAS-mutant cells. 

Interestingly, the clinical MEK inhibitor AZD6244, which has an N 
rather than a F at the S212-interacting site, was previously reported to 
have lower potency in a panel of oncogenic KRAS mutant cell lines 
(KRAS™™) versus BRAF(V600E) cells and to allow the induction of 
pMEK in KRAS™" cells*!5. Despite the presence of an N at this posi- 
tion, AZD6244 is predicted to form a weaker hydrogen bond with $212 
compared to GDC-0623 due to a longer hydrogen-bond distance from 
the S212 backbone NH (Supplementary Fig. 12). This leads us to refine 
our model by concluding that it is the strength of the inhibitor-MEK 
$212 interaction, rather than the exact identity of the atom interacting 
with MEK that is critical for the ability of MEK inhibitors to block 
pMEK induction and show strong potency in KRAS-mutant cells. 

Importantly, whereas GDC-0973 showed reduced efficacy in KRAS- 
mutant in vivo models compared to GDC-0623 and G-573, the reverse 
was found in BRAF(V600E) mutant models (Fig. 4a). In BRAF(V600E) 
xenografts, GDC-0973 showed the greatest efficacy at its MTD, with an 
increased rate of regressions. Because the rank order of compounds in 
efficacy studies was reversed in KRAS versus BRAF-mutant models, 
pharmacokinetic differences between the molecules do not account for 
these differences. 

Given that BRAF(V600E) tumour models have high basal levels of 
pMEK"* (Supplementary Fig. 13a), we proposed that GDC-0973 may 
have a higher binding affinity for activated pMEK, and thus greater 
efficacy in BRAF(V600E) cells compared to MEK inhibitors that pre- 
ferentially bind to the inactive form of MEK"*”’. To test this directly, 
we generated a phosphomimetic construct encoding constitutively 
active MEK(S218D S222D) (thereafter MEK-DD). In fluorescence 
polarization assays GDC-0623 and G-573 bound tenfold less strongly 
to MEK-DD than wild-type MEK, whereas GDC-0973 bound equiva- 
lently to both forms (Fig. 4b). We proposed that the reduction in 
binding affinity for GDC-0623 and G-573 against MEK-DD might 
be due to the weakening of their hydrogen-bond interaction with 
$212 as a result of increased dynamics of the A-loop and «-C helix 
in the MEK-DD conformation’. To test this, we introduced an addi- 
tional mutation, E114D, to the MEK-DD protein, predicted to further 
increase A-loop/a-C helix dynamics and further weaken the $212 inter- 
action with inhibitors. As predicted, GDC-0623 and G-573 lost sub- 
stantial binding affinity against MEK-DD(E114D), whereas GDC-0973 
was minimally affected. We attribute the strong binding of GDC-0973 
to the additional interactions it makes with the nucleotide and with 
residues within the MEK HRD motif of the catalytic loop (Fig. 4c; 
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Figure 4 | Potency of MEK inhibitors in BRAF- 
mutant versus KRAS-mutant tumours correlates 
with high inhibitor binding affinity against 
dually phosphorylated MEK (S218/S222) 


compared to un-phosphorylated MEK. 

a, Antitumour efficacy of GDC-0973 in A375 and 
Colo205 xenograft models, GDC-0623 in A375 

(n = 5 per group) or G-573 in Colo205 (n = 10 per 
group). In A375, GDC-0973 treatment led to 127% 
TGI, with 3 PR, 2 CR; GDC-0623 treatment led to 
102% TGI with 3 PR, 0 CR. In Colo-205, GDC- 
0973 treatment led to 141% TGI with 4 PR, 6 CR; 
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G-573 treatment led to 109% TGI with 3 PR, 0 CR. 
Group mean is plotted with error bars representing 
standard error of mean (SEM). CRC, colorectal 
cancer. b, Binding curves of GDC-0973, GDC-0623 
and G-573 against MEK1“", MEK-DD and MEK- 
DD(E114D) using fluorescence anisotropy (FA). Kj 
(nM) for inhibitor are GDC-0973: 0.05, 0.18, 0.48; 
GDC-0623: 0.13, 1.33, ~62.0; G-573: 0.30, 3.53, 
~284.0 for the MEK WT, MEKDD and MEK- 
DD(E114D) constructs, respectively. c, MEK1 
crystal structure of GDC-0973 (magenta) with a 
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Supplementary Fig. 10)”, and suggest that GDC-0973 is less affected by 
A-loop/a-C helix dynamics such as occurs during MEK activation. 

To confirm these findings in cells, we transfected wild-type MEK or 
MEK-DD into cells (note: MEK-DD(E114D) is catalytically dead and 
thus not used) and assessed the ability of the inhibitors to prevent 
phosphorylation of ERK (Fig. 4d). Consistent with the binding studies, 
we found that GDC-0623 and G-573, but not GDC-0973, lost potency 
against MEK-DD. 

We proposed that the strong binding of GDC-0973 to activated MEK 
might also explain its variable potency in KRAS-mutant cell lines. 
Comparison of pMEK levels in KRAS™" lines that were sensitive versus 
resistant to GDC-0973 (Fig. 1a) demonstrated that the most sensitive 
lines had high basal pMEK and pERK levels comparable to those in the 
A375 BRAF(V600E) line (Supplementary Fig. 13). Therefore, cells with 
high levels of activated MEK are sensitive to GDC-0973 even in the 
KRAS genotype. 

In conclusion, our study elucidates the mechanisms of distinct classes 
of allosteric MEK inhibitors that control their efficacy in KRAS versus 
BRAF-mutant tumours. We demonstrate that MEK inhibitors that form 
a strong hydrogen bond to $212 prevent phosphorylation of MEK by 
RAF and stabilize a RAF-MEK complex, leading to greater efficacy in 
KRAS-mutant tumours. Why this interaction is needed for strong inhibi- 
tion of MEK activity downstream of WT RAF is not entirely understood. 
Other MEK inhibitors have been shown to inhibit RAF phosphorylation 
of MEK", although the mechanism had not previously been elucidated. 
Furthermore, whereas the interaction of MEK inhibitors with $212 has 
been previously observed’*”*, our study is the first to link directly the 
strength of this interaction to the prevention of RAF-mediated MEK 
activation and increased potency in KRAS-mutant cells. 


In the context of BRAF(V600E) mutant tumours, we discovered 
that the superior efficacy of GDC-0973 against BRAF-mutant tumour 
models correlates with its potent binding and inhibition of activated 
MEK. Because of the high basal activity of MEK in BRAF(V600E) 
mutant tumours, this results in increased efficacy in this genetic back- 
ground. Our data are consistent with previous studies demonstrating 
that MEK is activated differently by oncogenic BRAF(V600E) com- 
pared to wild-type BRAF and CRAF in tumours with oncogenic 
KRAS**”’, We demonstrate that these differences can be exploited 
by targeting MEK inhibitors against distinct activated states of MEK, 
leading to increased efficacy in BRAF or KRAS-mutant preclinical 
models. 

With respect to the implications of these findings for the clinic, our 
studies were carried out with clinically relevant doses. Importantly, 
GDC-0973 has already shown strong clinical efficacy in BRAF(V600E) 
melanoma patients, with partial responses observed in 7/12 melanoma 
patients treated at MTD where 6 of the 7 responders had BRAF(V600E) 
mutations (Rosen, L. et al., unpublished data)”. Clinical trials with GDC- 
0973 and GDC-0623 will assess whether the observed differences in 
mechanism of action translate to differential therapeutic benefit for 
cancer patients. 


METHODS SUMMARY 


GDC-0973 and G-573 were synthesized at Genentech and have been described 
previously**”’. GDC-0623 and G-530 were synthesized according to the procedure 
in patent number US7803839 (2010). PD0325901 (PD901) was purchased from 
Active Biochem. Cells were obtained from the Genentech cell line repository and 
were cultured according to ATCC specifications. The establishment of MiaPaCa, 
H2122, A375 and Colo 205 xenografts is described in detail in Methods in the 
Supplementary Information. Kinase reactions, immunoprecipitations and western 
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blots were conducted as described previously’’. The binding studies of MEK1 to 
small-molecule inhibitors were performed by competitive fluorescence polarization 
and described in Methods in the Supplementary Information. MEK] purification 
and crystallization conditions are described in Methods in the Supplementary 
Information. Structure determination, refinement and modelling are described in 
Methods in the Supplementary Information and Supplementary Table 2. 
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More than 130 million people worldwide chronically infected with 
hepatitis C virus (HCV) are at risk of developing severe liver disease. 
Antiviral treatments are only partially effective against HCV infec- 
tion, and a vaccine is not available. Development of more efficient 
therapies has been hampered by the lack of a small animal model. 
Building on the observation that CD81 and occludin (OCLN) com- 
prise the minimal set of human factors required to render mouse 
cells permissive to HCV entry’, we previously showed that transient 
expression of these two human genes is sufficient to allow viral uptake 
into fully immunocompetent inbred mice”. Here we demonstrate 
that transgenic mice stably expressing human CD81 and OCLN also 
support HCV entry, but innate and adaptive immune responses 
restrict HCV infection in vivo. Blunting antiviral immunity in gene- 
tically humanized mice infected with HCV results in measurable 
viraemia over several weeks. In mice lacking the essential cellular co- 
factor cyclophilin A (CypA), HCV RNA replication is markedly 
diminished, providing genetic evidence that this process is faithfully 
recapitulated. Using a cell-based fluorescent reporter activated by 
the NS3-4A protease we visualize HCV infection in single hepato- 
cytes in vivo. Persistently infected mice produce de novo infectious 
particles, which can be inhibited with directly acting antiviral drug 
treatment, thereby providing evidence for the completion of the 
entire HCV life cycle in inbred mice. This genetically humanized 
mouse model opens new opportunities to dissect genetically HCV 
infection in vivo and provides an important preclinical platform for 
testing and prioritizing drug candidates and may also have utility 
for evaluating vaccine efficacy. 

The narrow species tropism of HCV is incompletely understood. 
Mouse cells do not support viral entry and inefficiently replicate HCV 
RNA, but do support virion assembly and release. HCV uses numerous 
cellular factors to enter hepatocytes in a coordinated multistep process, 
including glycosaminoglycans’, low-density lipoprotein receptor’, sca- 
venger receptor class B type I (SCARB1)’, the tetraspanin CD81 (ref. 6), 
the tight junction proteins claudin 1 (CLDN1)’ and occludin (OCLN)'*, 
the receptor tyrosine kinases epidermal growth factor receptor? and 
ephrin receptor A2 (ref. 9), and the cholesterol uptake receptor 
Niemann-Pick C1 like 1 (ref. 10). Of these, CD81 and OCLN comprise 
the minimal human factors needed for HCV uptake into rodent cells’. 

We recently demonstrated that adenoviral delivery of HCV entry 
factors renders mice susceptible to HCV infection’. This transient 
approach is high-throughput and allows the possibility of rapidly 
evaluating mutant versions of HCV entry factors. However, adenoviral 
gene delivery strongly induces interferon-stimulated genes, creating an 
environment that may antagonize HCV replication. To limit variabi- 
lity and to prevent vector-mediated immune activation, we generated 
transgenic mice stably expressing human CD81, SCARB1, CLDN1 
and/or OCLN under the control of a liver-specific albumin promoter. 


Transgenic expression of the human orthologues of the HCV entry 
factors resulted in similar mRNA levels of the human and endogenous 
mouse genes in the murine liver (Supplementary Fig. 1) and expression 
of all four proteins (Supplementary Fig. 2) with the expected subcel- 
lular distribution in the liver (Supplementary Fig. 3). Next, we aimed to 
test the susceptibility of entry factor transgenic (EFT) mice to HCV 
infection. To identify founder lines supporting viral entry we took advan- 
tage of a previously generated, highly sensitive detection system which 
is based on the activation of a loxP-flanked STOP-luciferase reporter in 
the genome of ROSA26-Fluc mice by Cre recombinase encoded in 
recombinant HCV genomes’. We crossed EFT mice to a ROSA26-Fluc 
background and challenged these animals with a bicistronic HCV genome 
expressing Cre (HCV-Cre). Consistent with previous data’, the bio- 
luminescent reporter was activated in mice expressing human CD81 
and OCLN (Fig. 1a and Supplementary Fig. 4a). The addition of human 
SCARB1 and CLDN1 (Supplementary Fig. 4b) did not increase the 
entry signal, demonstrating that their murine orthologues are func- 
tional for HCV entry in vivo. For subsequent experiments, founder 
lines Alb-hCD81/hOCLN#941 (2hEF) and Alb-hCD81/hSCARB1/ 
hCLDN1/hOCLN#100 (4hEF), which supported the most efficient 
viral uptake (Supplementary Fig. 4a), were used. To estimate the num- 
ber of HCV-infected liver cells, we used an indicator mouse strain in 
which Cre leads to activation of a nuclear-localized green fluorescent 
protein/-galactosidase (GNZ) reporter (ROSA26-GNZ)”. Similar to 
our previous observations”, HCV-Cre infection resulted in reporter 
activation in approximately 1-1.5% of murine hepatocytes in 2hEF 
or 4hEF mice (Fig. 1b and Supplementary Fig. 5). To provide additional 
evidence that viral uptake into EFT mice is mediated by the specific 
interaction of HCV glycoproteins with host entry factors, we adminis- 
tered antibodies directed against the HCV envelope glycoprotein com- 
plex E1E2 or the host entry factor CD81. Delivery of anti-human CD81 
or anti-E1E2 (AR4A; ref. 12) antibodies resulted in a dose-dependent 
inhibition of HCV-Cre infection (Fig. 1c), whereas isotype control immu- 
noglobulins had no effect. These data further affirm that HCV is taken up 
in a viral glycoprotein-specific fashion in vivo and underscore the utility 
of this model for evaluation of passive immunization strategies. 

Direct measurement of HCV genome levels by quantitative reverse 
transcription (qRT)-PCR demonstrated a slight increase in HCV 
RNA in the serum (at 4h) and liver (at 3h and 24h) of inoculated 
mice expressing the human entry factors; at 72 h, however, the signal 
was reduced to background levels (Fig. 1d, e). HCV infection resulted 
in the upregulation of several interferon-stimulated genes (Fig. 1f, g 
and Supplementary Fig. 6), infiltration of immune cells, especially 
natural killer (NK) cells, into the liver (Fig. li), and elevated proin- 
flammatory cytokine levels in the serum (Fig. 1h and Supplementary 
Fig. 7), which could antagonize HCV replication. This hypothesis is 
further supported by the previous observation that HCV replicons, 
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Figure 1 | Transgenic expression of human CD81 and OCLN renders mice 
permissive to HCV entry. a, Longitudinal bioluminescence imaging of 
ROSA26-Fluc mice expressing either human CD81 plus OCLN, SCARBI plus 
CLDNI1, CD81 alone, OCLN alone, or all four HCV entry factors (4hEF). 

b, Quantification of viral uptake into murine hepatocytes expressing human 
CD81 and OCLN orall four human entry factors determined by flow cytometry 
72h after infection with BiCre-Jcl. c, Blocking of HCV infection in vivo by 
either blocking antibodies against CD81 (JS-81) or neutralizing antibodies 
against HCV E1E2 (AR4A). ROSA26-Fluc 4hEF mice were injected with the 
indicated amounts of antibodies 24h and 4h before infection with BiCre-Jcl. 
d, e, Longitudinal quantification of HCV RNA by RT-qPCR in (d) serum and 
(e) liver of either wild-type or mice expressing all four HCV entry factors; 
dotted lines indicate the limit of detection (l.o.d). f, g, Expression of the 
interferon-stimulated genes Ifi27, Ifi44, Mx1 and Eif2ak2 (f) and viperin (also 
known as Rsad2), Oas1a and Ip10 (g) in the liver after infection with BiCre-Jcl 
in wild-type or 4hEF mice. h, Serum levels of IFN-y in wild-type or 4hEF mice 
infected with BiCre-Jcl.i, Analysis of liver-infiltrating IFN-y-secreting NKp46- 
positive NK cells in BiCre-Jcl-infected wild-type or 4hEF mice. All data shown 
are mean = s.d. of four independent experiments. For panels e-g, four mice 
were used per time point. *P < 0.05. 


selectable HCV RNA genomes, replicate more efficiently in murine 
cells with impaired antiviral signalling’**. 

To identify a murine environment more conducive to HCV replica- 
tion we crossed 4hEF ROSA26-Fluc mice to strains carrying targeted 
disruptions in Eif2ak2, Mavs, Irf1, Irf3, Irf7, Irf9, Stat1 or the Ifn-af 
receptor (Fig. 2). These strains are viable and known to be hyper- 
susceptible to RNA viruses, due to impaired innate immune responses. 
The luminescent reporter signal was slightly elevated during the early 
phase after infection with HCV-Cre in most EFT strains impaired in 
antiviral signalling as compared to EFT mice on a wild-type back- 
ground (Fig. 2a—d). Between 20 and 40 days after infection there was 
a marked increase in the luciferase signal, particularly in IRF1 (eight- 
fold), IRF7 (16-fold) (Fig. 2c), IFN-«BR (20-fold) and STAT 1 (40-fold) 
deficient mice (Fig. 2d) compared to non-transgenic littermate controls. 
The increased reporter signal, which presumably reflects a transient 
burst in viral replication and spread, eventually returned to background 
levels, possibly marking clearance of HCV-infected cells by the murine 
immune system. The elevated luminescent signal correlated with 
increases in serum HCV RNA levels at peak time points (Fig. 2e). 
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Figure 2 | Blunting of antiviral immune responses in mice expressing HCV 
entry factors augments HCV RNA replication. a-d, Bioluminescence 
kinetics of BiCre-Jcl-infected ROSA26-Fluc mice expressing all four HCV 
entry factors and entry-factor-negative controls with fully intact innate 
immune system (a) or impaired in EIF2AK2 or MAVS (b), IRF1, IRF3, IRF7 or 
IRF9 (c), and IFN-«BR or STATI (d). e, HCV RNA levels in the serum of 4hEF 
mice or 4hEF mice deficient in STAT1, IRF1, IRF3, IRF7, MAVS or IRF9 6 
weeks after infection with BiCre-Jcl. 


To validate that the elevated signal is indeed due to increased HCV 
RNA replication in mice with blunted antiviral immunity, we crossed 
cyclophilin A (CypA)-deficient mice (Ppia ‘ ~) to the EFT ROSA26- 
Fluc Stat1~'~ background. CypA isa critical host factor for HCV RNA 
replication in human cells'®. In Ppia’'~ 4hEF ROSA26-Fluc Stat1~/~ 
mice HCV RNA (Fig. 3c) and the luminescent reporter signal at peak 
times (day 31) was more than 60% lower than that of Ppia*’* and 
Ppia*'~ mice (Fig. 3d), probably as a direct consequence of the murine 
PPIA deficiency. These data provide the first direct genetic evidence 
that CypA is a bona fide HCV replication factor in vivo. Furthermore, 
treatment of 4hEF Stat1~‘~ mice with an NS5A inhibitor for 3 weeks 
after infection suppressed HCV RNA loads to below the limit of detec- 
tion (Fig. 3e), providing additional evidence that HCV RNA does indeed 
replicate in these mice. 

Unambiguous detection of HCV antigens in situ is difficult. There- 
fore, we constructed a transgenic mouse line expressing a modified 
version ofa previously described cell-based fluorescent reporter system'® 
to visualize infection directly in the liver of infected mice (Supplemen- 
tary Fig. 8). This highly sensitive reporter, the activity of which directly 
correlates with the level of HCV RNA replication, is based on cleavage 
of a blue fluorescent protein (TagBFP)-MAVS fusion protein by the 
HCV NS3-4<A serine protease and translocation of the mitochondrially 
anchored fluorescent protein to the nucleus. The chimaeric TagBFP- 
nlsMAVS protein was widely expressed in the liver as assessed by 
histology and flow cytometry (Supplementary Fig. 8a, b). To quantify 
HCV protease-mediated nuclear re-localization accurately we subjected 
single hepatocyte suspensions to ImageStream X analysis. Adenovirus- 
mediated overexpression of NS3-4A activated the reporter in a majority 
of cells (Supplementary Fig. 9), demonstrating the functionality of the 
reporter in vivo. In EFT* but not EFT” (Fig. 3a) TagBFP-nlsMAVS 
transgenic mice HCV infection causes overlap of TagBFP fluorescence 
with the nuclear counterstain in approximately 0.2% of the cells 
(Fig. 3b). This low but HCV-specific reporter activation is probably 
due to the limited amounts of NS3-4A being translated from the 
incoming viral RNA, which cannot accumulate as a consequence of 
the abortive presumably single-cycle infection on a fully immunocom- 
petent background. In contrast, crossing the EFT" TagBFP-nlsMAVS 
strain to a Stat1-null background increased the frequency of cells dis- 
playing HCV infection twofold over STAT1-positive controls (to 
0.4%). Notably, whereas the reporter was activated only transiently 
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Figure 3 | Visualization and genetic and pharmacological interference with 
HCV infection. a, b, Quantification of murine hepatocytes actively replicating 
HCV in wild-type, 4hEF and 4hEF Stat1~'~ mice as measured by the HCV 
NS3-4a-dependent cleavage of the TagBFP-nlsMAVS transgenic reporter 
construct by ImageStream X analysis. ce, Longitudinal HCV RNA levels and 
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luciferase signal in 4hEF Statl~‘~ mice lacking PPIA (c, d), and longitudinal 
HCV RNA levels in 4hEF Stat1 ‘~ mice treated with an HCV NSSA inhibitor 
(BMS-790052) for 20 days (e). Data shown are mean + s.d. of m = 10-18 mice 
from two independent experiments. **P < 0.01. 
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Figure 4 | HCV infection in 4hEF Stat1~'~ mice leads to immune 
activation. a-d, HCV RNA copies in serum (a, c) and liver (b, d) of 4hEF 
Stat! ‘~ mice during early (a, b) or late (c, d) infection with Con1/Jc1. 

e, Relative frequencies of the indicated lymphocyte subsets in spleens of wild- 
type, 4hEF, Statl‘~ or Stat1~'~ 4hEF mice isolated at the indicated time 
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points after infection with Con1/Jcl. f, Analysis of liver-infiltrating IFN-y- 
producing CD3*CD8°* T cells of wild-type, 4hEF, Stat1 ~~ and 4hEF Stat1~/~ 
mice after infection with Con1/Jcl. Data shown are mean + s.d. of three 
independent experiments. ***P < 0.001. 
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in STAT] sufficient mice, the signal persisted at least until day 10 in 
mice ona Statl~‘~ background (Fig. 3b). Taken together, these results 
indicate that HCV can replicate in immunocompromised mice expres- 
sing human CD81 and OCLN and highlight the value of the model for 
studying HCV replication in vivo. 

We further characterized viral kinetics in mice infected with a mono- 
cistronic cell-culture-derived HCV (HCVcc). HCV RNA rose approxi- 
mately tenfold over the limit of quantification in serum (Fig. 4a, c) and 
liver (Fig. 4b, d) of EFT Stat1~'~ mice as compared to non-transgenic 
Stat1-‘~ controls. Mice remained persistently infected for most of the 
observation period, with HCV RNA becoming nearly undetectable 
after 90 days (Fig. 4c, d). Sequence analysis of HCV RNA detected at 
late time points in mice infected with HCV-Cre revealed mutations in 
some viral genomes, but none was shared among the five mice that 
were analysed (Supplementary Fig. 12). Whether any of these are 
adaptive mutations that increase viral fitness in vivo will be the subject 
of future studies. HCV infection caused splenomegaly in some innate 
immune-deficient mice expressing HCV entry factors, but not in non- 
EFT mice (Supplementary Fig. 10a), indicative of HCV-specific immune 
activation. Indeed we observed increased frequencies of NK cells and 
subsequently B cells in the spleens of 4hEF Stat1~'~ mice in the early 
and late phases of infection, respectively (Fig. 4e and Supplementary 
Fig. 10b). Consistent with results in humans and chimpanzees, prim- 
arily NK and IFN-y-producing CD8* T cells (Fig. 4f) infiltrated the 
livers of 4hEF Stat] '~ mice with a skewing towards an effector memory 
phenotype near the end of the time course (Supplementary Fig. 10c). 
These data indicate that HCV infection elicits cellular immune res- 
ponses, albeit potentially confounded by the STAT 1 deficiency, which 
may contribute to eventual viral clearance. 

To determine whether primary hepatocytes in EFT mice on immu- 
nodeficient backgrounds were capable of producing infectious virions, 
sera collected at day 40 after infection were used to inoculate naive 
Huh-7.5 cells. Infectious virus was detected in sera of 4hEF mice defi- 
cient for STAT 1, IRF1 and IRF7 (Fig. 5a), consistent with the increased 
levels of luminescent reporter activity (Fig. 2c, d) and HCV RNA load 
(Fig. 2e) in these strains. In HCV-infected 4hEF Stat1 ~~ mice titres 
reached approximately 100 tissue culture infectious dose 50 (TCIDso) 
per ml (Fig. 5b). Sera from HCV-infected non-EFT littermates, or directly 
acting antiviral (DAA)-treated 4hEF Stat] /~ or 4hEF Stat” '~ Ppia'~ 
mice collected at the same time points, did not yield NS5A-positive 
cells (data not shown), indicating de novo virus production rather than 
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Figure 5 | Evidence for production of infectious particles. a, Statl ‘~, 

Irfl '~, Irf3', Irf7 '~,1rf9 ‘~ and Mavs ‘~ mice expressing all four human 
HCV entry factors were infected with BiCre-Jcl. Sera were collected 6 weeks 
after infection and were used to infect naive Huh-7.5 cells. NS5A staining was 
performed 72 h after infection and the frequency of HCV antigen-bearing cells 
quantified by flow cytometry. b, HCV infectious particles released into the 
serum of 4hEF Stat] ‘~ mice, 4hEF Stat1~/~ mice lacking PPIA or 4hEF 
Stat1”'~ mice treated with BMS-790052 for 20 days as determined by limiting 
dilution assay. Data shown are mean + s.d. of four independent experiments. 
ND, not detectable. 
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carry-over of the inoculum. To demonstrate that the lack of HCV 
NS5A-positive cells in the infectivity assay cannot simply be attributed 
to residual quantities of the DAA in circulation, we spiked the sera 
from the BMS-790052-treated animals with tissue-culture-produced 
BiCre-Jc1 virus (Fig. 5b) and found no inhibitory effect. Furthermore, 
in Huh-7.5 cells expressing an TagRFP-nlsMAVS reporter”®, infection 
with sera from infected mice resulted in re-localization of the reporter 
signal whereas inoculation with sera from non-infected, DAA-treated 
or Ppia~'~ mice did not (Supplementary Fig. 11). These data are in 
accordance with previous in vitro observations demonstrating that 
mouse hepatoma cells support late stages of the HCV life cycle’”. 

This study represents an important step forward in developing an 
animal model for HCV infection and immunity. To our knowledge 
this is the first time that the entire HCV life cycle has been recapitu- 
lated in inbred mice with inheritable susceptibility to HCV. Previously 
developed xenotransplant mouse models can also be infected with HCV. 
However, these models are hampered by intra- and inter-experimental 
variability, donor-to-donor variability, low throughput, and high costs. 
The inbred mouse model presented here can overcome many of these 
challenges, is amendable to genetic manipulations and can be used for 
preclinical assessment of the efficacy of antiviral drug candidates and 
entry inhibitors. To study unperturbed HCV-specific immune res- 
ponses and HCV-associated pathogenesis it will be necessary to estab- 
lish persistent HCV infection in fully immunocompetent mice. By 
harnessing the remarkable genetic plasticity of HCV it may be possible 
to select for viral variants that replicate more robustly in sufficiently 
immunocompromised rodent strains. High titre sera could subse- 
quently be passaged through progressively more immunocompetent 
hosts to produce this outcome. Future studies will address the capacity 
of different, genetically diverse HCV genotypes to establish chronicity 
in genetically humanized mice. 


METHODS SUMMARY 


Mice. Construction ofall mice expressing HCV entry factors under the control ofa 
liver-specific albumin promoter or CAGGS-TagBFP-nlsMAVS and the source of 
all other mouse mutant strains is described in the Methods. Mice were bred and 
maintained at the Comparative Bioscience Center of The Rockefeller University 
according to guidelines established by the Institutional Animal Committee. 
Hepatitis C virus. Plasmids encoding chimaeric HCV genomes, including Jcl, 
Con1-Jcl and bicistronic HCV genomes expressing Cre, were linearized with Xbal 
and in vitro transcribed. RNA was electroporated into Huh-7.5 cells and infectious 
virus was collected from supernatants 48-72 h after transfection”®. 

RT-PCR quantification of HCV RNA. Total RNA was isolated from mouse 
brain, liver and sera using the RNAeasy kit (Qiagen). HCV genome copy number 
was quantified by one-step RT-PCR. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Animals and cell lines. Gt(ROSA)26Sor'™! "Klin (ref, 19) (ROSA26-Fluc), 
B6;129-Gt(ROSA)26Sor'™°°/] (ref, 20) (Rosa26-GNZ) and C57BL/6 (wild type) 
mice were obtained from The Jackson Laboratory. J. Colgan (University of lowa) 
made the Ppia'™!'"" (Ppia”'~)*! mice available. ROSA26-Fluc mice contain the 
firefly luciferase (Juc) gene inserted into the Gt(ROSA)26Sor locus. Expression of 
the luciferase gene is blocked by a loxP-flanked STOP fragment placed between 
the luc sequence and the Gt(ROSA)26Sor promoter. Cre-recombinase-mediated 
excision of the transcriptional stop cassette results in luciferase expression in Cre- 
expressing tissues. ROSA26-GNZ knock-in mice have widespread expression 
of a nuclear-localized green fluorescent protein-f-galactosidase fusion protein 
(nlsGEP-GNZ) once an upstream loxP-flanked STOP sequence is removed. 
When Cre recombinase is introduced into cells the resulting GNZ fusion protein 
expression allows for enhanced (single cell level) visualization. Mice were bred and 
maintained at the Comparative Bioscience Center of the Rockefeller University 
according to guidelines established by the Institutional Animal Committee. Huh- 
7.5 (ref. 22), Huh-7.5.1 (ref. 23) and Huh-7.5 stably expressing the TagRFP- 
nlsMAVS reporter’® were maintained in DMEM with 10% fetal bovine serum 
(FBS) and 1% nonessential amino acids (NEAA). 

Mutant mice with targeted disruptions in genes involved in antiviral defences. 
Irfy'™!M*« (Irf1~/~ 4 mice were obtained from the Jackson Laboratory, Ifnar1"™!4& 
(Ifnar1 ~'~)5 from B&K Universal Ltd, and Stat1'™!’ (Stat1~/~ )?* from Taconic. 
Bel2112/Irf3°" "8 (Irf3/~)?, Irf7°™ 8 (Irf7_'~)?8 and Irfo'™ "8 (Info '~)? mice 
were provided by T. Taniguchi. T. Satoh and S. Akira provided Dhxsa'm(As0K) Ali 
(LGP2A30K/A30K)” and T. Kawai and S$. Akira provided Mavs'™4* (Mavs /~)?! 
mice. A. Garcia-Sastre (Mount Sinai School of Medicine) made the Eifzak2"™!©"" 
(Eif2ak2~' ~)?? and A. Berns (Netherlands Cancer Institute) the Rag2 / ~ TL2RNU! 
(ref. 33) mice available. 

Generation of HCV entry factor transgenic mice. cDNAs encoding human 
CD81, SCARB1, CLDN1 or OCLN were cloned into a vector between a chimaeric 
intron and the 3’ flanking regions of human growth hormone (GH1), in which the 
mouse albumin enhancer/promoter drives gene expression**. Vector-free human 
CD81, SCARB1, CLDN1 and/or OCLN expression fragments were prepared by 
NotI and KpnI digestion and microinjected alone or in combination into fertilized 
C57BL/6 mouse eggs. Transgenic offspring were mated with C57BL/6 wild-type 
animals to select for founder lines stably inheriting the transgene(s). In some mice 
that were co-injected with multiple expression constructs, transgenes did not 
segregate in the F, generation, indicating separate insertions in close proximity 
or insertion as concatemers. The founder lines were designated as follows: 
C57BL/6-Tg(Alb-hCD81)976""*" (Alb-hCD81), C57BL/6-Tg(Alb-hSCARB1)1°*" 
(Alb-hSCARB1), C57BL/6-Tg(Alb-hOCLN)70"°2" (Alb-hOCLN), C57BL/6- 
Tg(Alb-hCD81/hOCLN)941?"""" (Alb-hCD81/hOCLN), C57BL/6-Tg(Alb-hSCARB1/ 
hCLDN1)935"°""¥ (Alb-hSCARBI/hCLDN1), C57BL/6-Tg(Alb-hCD81/hSCARB1 
/hCLDN1/hOCLN) 100°" (Alb-EFT4x). 

Generation of TagBFP-nlsMAVS reporter mice (C57BL/6-Tg(TagBFP- 
nlsMAVS)4065). The TagRFP-NLS-MAVS(WT) cassette’® was inserted into pCR2.1- 
TOPO (Invitrogen) and modified to contain TagBFP in place of TagRFP. The 
resulting TagBFP-NLS-MAVS cassettes was inserted into the pCAGGS vector 
(Addgene) to yield pCAGGS-TagBFP-NLS-MAVS(WT). The pCAGGS back- 
bone drives transgene expression from a ubiquitously active chimaeric CMV/B 
actin promoter*’ and has been used to create transgenic mice successfully (for 
example, ref. 36). The pCAGGS-TagBFP-NLS-MAVS(WT) internal transgenesis 
cassette was isolated by linearization with Sall/PstI enzymes and injected into 
C57BL/6 pronuclei. Founder animals were identified by PCR and bred with con- 
genic C57BL/6J animals. 

HCV generation and infections. Construction of BiCre-Jcl (ref. 2), Jcl (ref. 37), 
Con1-Jcl (ref. 38) and Jcl 5AB Ypet*’ was described elsewhere. Huh-7.5.1 (ref. 23) 
or Huh-7.5 (ref. 22) cells were electroporated with in vitro transcribed full-length 
HCV RNA. Seventy-two hours after electroporation, the medium was replaced 
with DMEM containing 1.5% FBS and supernatants were collected every 6 h starting 
from 72h. Pooled supernatants were clarified by centrifugation at 1,500g, filtered 
through a 0.45 jm bottle top filter (Millipore) and concentrated using a stirred cell 
(Millipore). Viral titres (TCIDso) were determined using Huh-7.5 cells as previously 
described'*. All infections of mice with the indicated genotypes were performed 
intravenously. 

Antibodies and inhibitors. Blocking antibodies against CD81 (JS81) and IgG1 
control antibodies were obtained from BD Biosciences. Antibodies against NS5A 
(ref. 18) and E2 (clone AR4A)”’ and the human IgG] isotype control (b12 (ref. 40), 
provided by D. Burton, The Scripps Research Institute) have been described previ- 
ously. Antibodies for the detection of human CD81 were purchased from BD 
Biosciences, OCLN from BD Biosciences (for histology) and from Invitrogen 
(for western blotting), CLDN1 from Invitrogen (for western blotting) and 
Abcam (for histology), and SCARB1 from Genetex (for histology) and from BD 


Biosciences (for western blotting). The HCV NSSA inhibitor BMS-790052 (ref. 41) 
was obtained from Selleck Chemicals. 

Adenovirus constructs. Adenoviral constructs encoding HCV JFH1 NS3-4A 
were created using the AdEasy Adenoviral Vector System (Agilent Technologies) 
according to the manufacturer’s instructions. Briefly, JEH1 NS3-4A cDNA was 
PCR-amplified from pCR3.1-NS3/4A and inserted into the pShuttle-CMV using 
KpnI/Notl sites. Recombinant pShuttle-CMV plasmids were linearized with Pmel 
and ligated to pAdEasy by homologous recombination followed by electroporation 
into BJ5183 cells (Agilent). Recombinant pShuttle-pAdEasy constructs were iden- 
tified by Pacl restriction analysis. All plasmid constructs were verified by DNA 
sequencing. 

Histological detection of HCV entry factors. Liver and spleen of mice injected 
with adenoviruses encoding human entry factors were collected 24 h after injec- 
tion and fixed using 4% paraformaldehyde. Tissue sections (8 um) were depar- 
affinized and subjected to antigen retrieval by boiling for 30 min in citrate buffer 
(10 mM sodium citrate, 0.05% Tween 20, pH 6.0). Entry factors were stained with 
human-specific primary antibodies for 16 h at 4°C followed by secondary anti- 
body staining using Alexa 488 or Alexa 633-conjugated antibodies for 2 h at room 
temperature. For in situ detection of eGFP fluorescence, mouse tissue was imme- 
diately frozen in OCT (Optimal Cutting Temperature) compound at —80 °C. Tissue 
sections (~5-6 jm) were cut on poly-L-lysine-coated slides. Secondary antibodies 
goat-anti-mouse or goat-anti-rabbit Alexa 488- or rhodamine conjugates (Invitrogen; 
1:1,000) were used for immunofluorescence. Nuclei were detected using DAPI in 
VectaShield Mounting medium (Vector Laboratories). Images were captured on 
an Axioplan 2 imaging fluorescence microscope (Zeiss) using Metavue Software 
(Molecular Devices). Images were processed using Image] software (NIH). 
Isolation of murine hepatocytes. Mice were anaesthetized by intraperitoneal 
injection of a mixture of 100mgkg ' ketamine and 10mgkg’ ' xylazine. Livers 
were perfused through the inferior vena cava for 5 min each with chelating buffer 
(0.5 mM EGTA, 0.05M HEPES pH7.3 in Ca/Mg-free HBSS) at a flow rate of 
2 ml min * followed by collagenase solution (4.8 mM CaCh, 100 U ml * collagenase 
type IV, 0.05 M HEPES pH 7.3 in Ca/Mg-free HBSS). The resulting cell suspension 
was passed through a 100 jum cell strainer, washed twice in HBSS and was fixed in 
4% paraformaldehyde. Purity of isolated hepatocytes was over 90% in all prepara- 
tions as confirmed by intracellular staining for murine albumin. 

Immune activation. Lymphocytes were isolated from liver and spleen by diges- 
tion with 0.1% collagenase (Sigma) for 30 min at 37 °C. Lymphocytes were then 
isolated from the cell suspensions as well as from peripheral blood by density 
gradient centrifugation. Cells were stained with directly fluorochrome-conjugated 
antibodies against CD3, CD4, CD8, B220 (eBioscience) and NKp46 (BD Biosciences). 
After cell surface staining, cells were fixed and permeabilized using BD Cytofix/ 
Cytoperm (BD Biosciences) and stained with antibodies against IFN-y and TNF-«. 
Samples were measured using a BD LSR2 flow cytometer (BD Biosciences) and 
data were analysed using Flowjo (Treestar Software). 

Western blotting. Perfused murine liver tissue was homogenized in lysis buffer 
containing 1% Triton X-100, 50mM Tris-HCl pH 8, 150mM NaCl and Mini 
EDTA-free Protease Inhibitor Cocktail (Roche) for 30 min on ice. Thirty micro- 
grams of protein lysate was separated on 4—12% Bis/Tris NuPage polyacrylamide 
gels (Invitrogen). Proteins were transferred to nitrocellulose membranes and entry 
factors were detected using antibodies against human SCARBI (1:500) and CLDN1 
(1:200). B-actin (1:10,000) was probed as a loading control. After secondary anti- 
body staining with HRP-conjugated anti-mouse IgG Fc (JIR, 1:10,000), western 
blots were visualized using SuperSignal West Pico (Thermo Scientific). 
RT-PCR quantification of HCV entry factors and interferon-stimulated 
genes. To quantify expression of human and murine genes (entry factors and 
interferon-stimulated genes), the livers of FVB/NJ mice were collected at the 
indicated time points. Total liver RNA was isolated using RNeasy isolation kit 
(Qiagen) and cDNA was synthesized from 0.5 ug RNA using a SuperScript VILO 
cDNA Synthesis kit (Invitrogen) according to manufacturer’s instructions. Quanti- 
tative PCR was performed with a light cycler LC480 (Roche Applied Science) using 
an Applied Biosystems SYBR Green PCR Master Mix and the following primer 
pairs: human CD81 forward 5'-TGTTCTTGAGCACTGAGGTGGTC-3’, reverse 
5'-TGGTGGATGATGACGCCAAC-3’; human SCARB1 forward 5'-CGGATTT 
GGCAGATGACAGG-3’, reverse 5’-GGGGGAGACTCTTCACACATTCTAC-3’; 
human CLDNI1 forward 5'-CACCTCATCGTCTTCCAAGCAC-3’, reverse 5’- 
CCTGGGAGTGATAGCAATCTTTG-3’; human OCLN forward 5'-CGGCAAT 
GAAACAAAAGGCAG-3’, reverse 5'-GGCTATGGTTATGGCTATGGCTAC-3’; 
mouse Cd81 forward 5'-GGCTGTTCCTCAGTATGGTGGTAG-3’, reverse 5’- 
CCAAGGCTGTGGTGAAGACTTTC-3’; mouse Scarb1 forward 5'-CAAAAAGC 
ATTTCTCCTGGCTG-3’, reverse 5’-AATCTGTCAAGGGCATCGGG-3’; mouse 
Cldn1 forward 5'-TTATGCCCCCAATGACAGCC-3’, reverse 5’-ATGAGGTGC 
CTGGAAGATGATG-3’; mouse Ocln forward 5’-ACTAAGGAAGCGATGAA 
GCAGAAG-3’, reverse 5'-GCTCTTTGGAGGAAGCCTAAACTAC-3’; mouse 
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Gapdh forward 5'-ACGGCCGCATCTTCTTGTGCA-3’, reverse 5'-ACGGCCA 
AATCCGTTCACACC-3’; mouse viperin forward 5'-TGCTGGCTGAGAATAG 
CATTAGG-3’, reverse 5'-GCTGAGTGCTGTTCCCATCT-3’; mouse [fi27 forward 
5'-GCTTGTTGGGAACCCTGTTTIG-3’, reverse 5'-GGATGGCATTTGTTGAT 
GTGGAG-3’'; mouse Jfi44 forward 5’-AACTGACTGCTCGCAATAATGT-3’, 
reverse 5’-GTAACACAGCAATGCCTCTTGT-3’; mouse Mx1 forward 5'-GAC 
CATAGGGGTCTTGACCAA-3’, reverse 5’-AGACTTGCTCTTTCTGAAAAG 
CC-3'; mouse Eif2ak2 forward 5'-ATGCACGGAGTAGCCATTACG-3’, reverse 
5’-TGACAATCCACCTTGTTTTCGT-3’; mouse Oasla forward 5'-ATGGAGC 
ACGGACTCAGGA-3’, reverse 5’-TCACACACGACATTGACGGC-3’; mouse 
Ifnb1 forward 5'-CAGCTCCAAGAAAGGACGAAC-3’, reverse 5'-GGCAGTG 
TAACTCTTCTGCAT-3’; mouse [p10 forward 5'-CCAAGTGCTGCCGTCATT 
TTC-3’, reverse 5'-GGCTCGCAGGGATGATTTCAA-3’. 

RT-PCR quantification of HCV RNA. Total RNA was isolated from the indi- 
cated mouse tissues using the RNAeasy kit (Qiagen). HCV genome copy number 
was quantified by one step RT-PCR using Multicode-RTx HCV RNA kit (Eragen) 
and a light cycler LC480 (Roche Applied Science), according to the manufacturers’ 
instructions. 

ImageStream. Hepatocytes were isolated as described above. Hepatocytes were 
fixed in 4% paraformaldehyde, permeabilized using 0.1% Triton X-100 and stained 
with antibodies against mouse albumin (Cedarlane) and DRAQ5 (eBioscience) 
before ImageStream acquisition using an ImageStream X with Multimag (Amnis). 
Bioluminescence imaging. Unless otherwise specified, mice were injected with 
101! adenoviral particles 24h before intravenous injection with 2 X 10” TCIDso 
HCV-Cre. At 72 h after infection, mice were anaesthetized using ketamin/xylazine 
and injected intraperitoneally with 1.5 mg luciferin (Caliper Lifesciences). Biolumine- 
scence was measured using an IVIS Lumina II platform (Caliper Lifesciences). 
HCV genome sequencing. Total RNA isolated from mouse sera was used to 
synthesize cDNA covering the HCV genome using the SuperScript III First-Strand 
Synthesis System for RT-PCR (Invitrogen Life Technologies). Briefly, cDNA frag- 
ments were synthesized covering the BiCre HCV genome, using the following 
primer sets: 10526/17376, 16768/12470, 5928/17377 and 6993/15692, or 10526/ 
17375, 15356/17376, 15695/6986, 3949/5996, 17119/12470, 5928/6815, 6993/ 
16864, 16863/17388, depending on the mouse (see below). 

The following primers were used for cDNA synthesis: RU-O- 10526, 5’-ACCTG 
CCCCTAATAGGGGCGAC-3’ (sense, 5’ UTR); RU-O-17376, 5’-TGCTGGCGT 
TGAAGTCAGCTC-3’ (antisense, E2); RU-O-16768, 5'-CGCACCCATACTGT 
TGGGGG-3' (sense, E2); RU-O-12470, 5’-AAGCCTCATACAGGACCTCC-3' 
(antisense, NS4A); RU-O-5928, 5'-GATGCTACCTCCATTCTCG-3’ (sense, NS5B); 
RU-O-17377, 5'-GCATTGITGGGCGCAACTATCC-3’ (antisense, NS5B); RU-O-6993, 
5'-CCGCCCTCACCAGTCCGTTGT-3’ (sense, NS4B); RU-O-15692, 5'-TATT 
ACCGCCTTTGAGTGAGCTGA-3’ (antisense, 3’ UTR); RU-O-17375, 5'-CCG 
AACCACGGGGACGTGGTT-3’ (antisense, EMCV); RU-O-15356, 5’-GCAAG 
GTCTGTTGAATGTCG-3’ (sense, EMCV); RU-O-15695, 5'’-CGAATGTGGCC 
GTGCAGCGGC-3’ (sense, E1); RU-O-6986, 5'’-AGTCTTTGGAGCCGTGCAA 
GT-3’ (antisense, NS3); RU-O-3949, 5’-GCATCCTGATACCACTTACCTC-3' 
(sense, E2); RU-O-5996, 5'-CAGGTAGAGGAAGACAGGGCA-3’ (antisense, NS3); 
RU-O-17119, 5'-GCTCCCATCACTGCTTATGCC-3’ (sense, NS3); RU-O-6815, 
5'-CAACACCTATGACGTGGACATG-3’ (antisense, NS5A); RU-O-16864, 5’-GT 
AACTCGCTGTTGCGATACC-3’ (antisense, NS5B); RU-O-16863, 5’-CAGGT 
AGAGCTTCAACCTCCC-3’ (sense, NS5A); RU-O-17388, 5'-ATTGCCGGAGG 
CGCGCCTACT-3’ (antisense, 3’ UTR). 

Resulting cDNA fragments were then amplified by PCR using Taq polymerase 
and a combination of the PCR primers listed below. m44 and m55 primer pairs: 
3949/5996, 14317/17435, 7086/17376, 15355/16490, 16253/17377, 6993/6815, 3949/ 
6986, 17119/5996. m46 and mé4 primer pairs: 3949/5996, 16768/6986, 15354/ 
16490, 6993/16864, 10526/17374, 15356/17376, 17119/12470, 14317/17435, 5928/6815, 
7086/17376. m54 primer pairs: 15355/17375, 7086/17376, 3949/6986, 3949/10215, 
17119/5996, 5928/12470, 6993/6815, 17391/16864, 14317/17388. 

Primers used for sub-fragment PCR amplification: RU-O-3949, 5'-GCATCC 
TGATACCACTTACCTC-3’ (sense, E2); RU-O-5996, 5’-CAGGTAGAGGAAG 
ACAGGGCA-3’ (antisense, NS3); RU-O-14317, 5'-CGGGTGGAGTATCTCTTGA 
A-3' (sense, NS5B); RU-O-17435, 5'-CTGTGTGAAATTGTTATCCGC-3’ (anti- 
sense, 3’ UTR); RU-O-7086, 5’-TGGGCAGGATGGCTCCTGTC-3’ (sense, core); 
RU-O-17376, 5'-TGCTGGCGTTGAAGTCAGCTC-3’ (antisense, E2); RU-O-15355, 
5'-TGGCAGAACGAAAACGCTG-3’ (sense, nlsCre); RU-O-16490, 5’-CGCTG 
CCGAAGTGAAGAACA-3’ (antisense, E1); RU-O-17377, 5'-GCATTGTGGGC 
GCAACTATCC-3’ (antisense, NS5B); RU-O-6993, 5’-CCGCCCTCACCAGTC 
CGTTGT-3’ (sense, NS4B); RU-O-6815, 5’-CAACACCTATGACGTGGACATG-3’ 
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(antisense, NS5A); RU-O-16253, 5’-CATAGGTTTGCACCCACA-3’ (sense, NS5A); 
RU-O-6986, 5'-AGTCTTTGGAGCCGTGCAAGT-3’ (antisense, NS3); RU-O-17119, 
5'-GCTCCCATCACTGCTTATGCC-3’ (sense, NS3); RU-O-16768, 5'-CGCAC 
CCATACTGTTGGGGG-3’ (sense, E2); RU-O-15354, 5’-CAAGAAGAAGAGG 
AAGGTGTC-3’ (sense, nlsCre); RU-O-10526, 5’-ACCTGCCCCTAATAGGGG 
CGAC-3’ (sense, 5’ UTR); RU-O- 16864, 5’-GTAACTCGCTGTTGCGATACC-3’ 
(antisense, NS5B); RU-O-17374, 5'- CG€GTCCCAGCCACGTGGAAGGC -3’ (anti- 
sense, core); RU-O-15356, 5'’-GCAAGGTCTGTTGAATGTCG-3’ (sense, EMCV); 
RU-O-12470, 5’-AAGCCTCATACAGGACCTCC-3' (antisense, NS4A); RU-O-5928, 
5'-GATGCTACCTCCATTCTCG-3’ (sense, NS5B); RU-O-17375, 5'-CCGAAC 
CACGGGGACGTGGTT-3’ (antisense, EMCV); RU-O-10215, 5’-CATCTATGA 
CCACCTCACACC-3’ (antisense NS2); RU-O-17391, 5'-CATAGGTTTGCACC 
CACACCA-3’ (sense, NS5A); RU-O-17388, 5'-ATTGCCGGAGGCGCGCCTA 
CT-3’ (antisense, 3’ UTR). 

The resulting PCR amplicons were cloned into the pCR2.1 vector using the 
TOPO TA Cloning kit (Invitrogen Life Technologies). Resulting clones were 
screened for proper amplicon insertion size by EcoRI digestion, and sequenced 
using M13F/R primers, as well as additional internal primers for full genomic 
coverage (MacrogenUSA). 

Statistical analysis. Statistical analyses were performed using Graphpad Prism 
Software. Statistics were calculated using Kruskal-Wallis one-way analysis of 
variance. P values below 0.05 were considered statistically significant. 
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Pathogen blocks host death receptor signalling by 
arginine GlcNAcylation of death domains 


Shan Li'**, Li Zhang?**, Qing Yao"”, Lin Li*, Na Dong”, Jie Rong*, Wenqing Gao”, Xiaojun Ding’, Liming Sun’, Xing Chen’, 


She Chen? & Feng Shao? 


The tumour necrosis factor (TNF) family is crucial for immune 
homeostasis, cell death and inflammation. These cytokines are 
recognized by members of the TNF receptor (TNFR) family of death 
receptors, including TNFR1 and TNFR2, and FAS and TNF-related 
apoptosis-inducing ligand (TRAIL) receptors’. Death receptor 
signalling requires death-domain-mediated homotypic/heterotypic 
interactions between the receptor and its downstream adaptors, 
including TNFR1-associated death domain protein (TRADD) and 
FAS-associated death domain protein (FADD). Here we discover 
that death domains in several proteins, including TRADD, FADD, 
RIPK1 and TNFRI, were directly inactivated by NleB, an entero- 
pathogenic Escherichia coli (EPEC) type III secretion system effector 
known to inhibit host nuclear factor-kB (NF-KB) signalling**. NleB 
contained an unprecedented N-acetylglucosamine (GlcNAc) trans- 
ferase activity that specifically modified a conserved arginine in 
these death domains (Arg 235 in the TRADD death domain). NleB 
GlcNAcylation (the addition of GlcNAc onto a protein side chain) of 
death domains blocked homotypic/heterotypic death domain inter- 
actions and assembly of the oligomeric TNFR1 complex, thereby 
disrupting TNF signalling in EPEC-infected cells, including NF- 
kB signalling, apoptosis and necroptosis. Type-III-delivered NleB 
also blocked FAS ligand and TRAIL-induced cell death by prevent- 
ing formation of a FADD-mediated death-inducing signalling com- 
plex (DISC). The arginine GlcNAc transferase activity of NleB was 
required for bacterial colonization in the mouse model of EPEC 
infection. The mechanism of action of NleB represents a new model 
by which bacteria counteract host defences, and also a previously 
unappreciated post-translational modification. 

EPEC contains several type-III-secreted effectors, including NleC/D**, 
Tir?’°, NleE and NleB**""”, all of which can inhibit host NF-«B signal- 
ling and pro-inflammatory cytokine production. Among these effec- 
tors, NleB is required for virulence in vivo'*’. NleB homologues are 
present in Salmonella and pathogenic E. coli strains’®. Consistent with 
previous studies*”, the expression of NleB (E2348C_3231) in HeLa cells 
selectively blocked TNF-x but not interleukin (IL)-18 activation of 
NF-«B signalling (Supplementary Fig. la, b). TNF-c-induced IkB-a 
phosphorylation and degradation were both severely inhibited (Sup- 
plementary Fig. 1c). NleB blocked TRAF2- but not TRAF6-induced 
NF-«B activation (Supplementary Fig. 2a). TNF-a- but not IL-1- 
stimulated TAK1 phosphorylation, downstream of the receptor and 
TRAF complex, was diminished by NleB (Supplementary Fig. 2b). 
TNFR can also induce apoptosis or necroptosis (in cells deficient in 
caspase activity), playing an important role in microbial infection and 
inflammation’’. NleB efficiently blocked TNF-« plus cycloheximide- 
induced 293T cell apoptosis (Fig. 1a), and also blocked necroptosis of 
HeLa cells stably expressing the RIP3 protein kinase (HeLa-RIP3 cells) 
(Fig. 1b). This agrees with NleB targeting upstream of TAK1, as TAK] is 
not required for TNF-c.-induced cell death’*. Thus, NleB differs from 


other NF-«B-targeting effectors and can block both NF-KB signalling 
and TNF-o-induced cell death. 

A yeast two-hybrid screen of a HeLa complementary DNA library 
identified a cDNA clone encoding the death domain of TRADD, a 
universal component of the TNFR1 but not IL-1R complex'*’'. NleB 
also interacted with full-length TRADD, but not with components of 
the TAK1 and IkB kinase (IKK) complexes; no interaction occurred 
between NleE and TRADD (Fig. 1c). Endogenous TRADD was readily 
precipitated by NleB (Supplementary Fig. 3a). The TRADD death 
domain (residues 195-312) was required and sufficient for precipita- 
tion by enhanced green fluorescent protein (eGFP)-NleB (Fig. 1d and 
Supplementary Fig. 3b). Similar to that observed with TNF-« stimu- 
lation, NleB completely abolished TRADD overexpression-induced 
NF-«B activation (Fig. le) and apoptosis in 293T cells (Fig. 1f and 
Supplementary Fig. 4a). Thus, NleB can target TRADD and disrupt 
multiple signalling pathways downstream of TNFRI. 

NleB did not affect TRADD stability or turnover (Supplementary 
Fig. 5a). TRADD contains an amino-terminal TRAF2-binding domain 
and a carboxy-terminal death domain. The TRADD death domain 
oligomerizes with itself and also with death domains of TNFR1 and 
FADD”. Death-domain-mediated TRADD recruitment to TNFR1 
initiates TNF-« signalling, and TRADD-FADD complex formation 
triggers apoptosis”. TRADD readily precipitated TNFR1c (the intra- 
cellular death-domain-containing region) from 293T cells, and this 
was abolished by NleB (Supplementary Fig. 5b). A similar blocking 
effect was observed in TRADD-FADD but not in TRADD-TRAF2 co- 
immunoprecipitation (Supplementary Fig. 5c, 6). TRADD oligomeriza- 
tion is crucial for TNFR1 complex assembly”. TRADD or TRADD 
death domain expressed in 293T cells appeared as a large oligomer on 
a blue native polyacrylamide gel electrophoresis (PAGE) gel, and NleB 
co-expression shifted TRADD and also the TRADD death domain to a 
lower molecular mass position that roughly corresponded to the mono- 
mer form (Fig. 2a). Consistently, NleB inhibited the recruitment of 
TRADD, TRAF2, HOIP, SHAPPIN, HOIL-1L and _ ubiquitinated 
RIPK1 to TNFRI (Fig. 2b). Thus, NleB inactivates the TRADD death 
domain and disrupts homotypic/heterotypic death domain interactions 
and the TNFR1 complex assembly. 

In these TRADD oligomerization and binding assays, the amount of 
NleB was considerably lower than those of TRADD, indicating that NleB 
may act in a catalytic manner on the TRADD death domain. Mass 
spectrometry analysis identified a 203-dalton (Da) mass increase on 
TRADD death domain co-expressed with NleB in either E. coli (Sup- 
plementary Fig. 7) or 293T cells (Fig. 2c), whereas control TRADD death 
domain showed the theoretic molecular mass. The 203-Da mass increase 
indicated a GlcNAc modification. Consistently, recombinant NleB effi- 
ciently transferred 3H-GIcNAc, but not °H-glucose, from radiolabelled 
sugar donors onto purified TRADD (Fig. 2d). Cold UDP-GlcNAc, but 
not UDP-GalNAc, UDP-glucose, UDP-galactose and UDP-GlcA, 
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Figure 1 | NleB blocks TNF-a signalling by directly targeting the TRADD 
death domain. a, Effects of NleB transfection on TNF-«-induced caspase-3 
activation in 293T cells. CHX, cycloheximide. b, Vector- or NleB-transfected 
HeLa-RIP3 cells were stimulated with TNF-« plus Smac mimetic (SmacM) and 
z-VAD, and cell viability was determined by measuring ATP levels. DMSO, 
dimethylsulphoxide. c, Yeast strain was transformed with indicated plasmid 
combinations (bait plus prey) to assay the interaction of NleB with TRADD or 
other components downstream of TNER. Yeast strains were grown on SD-LW 
(non-selective) and SD-LWHA (selective) media. DD, death domain; Y2H, 
yeast two-hybrid. d, Co-immunoprecipitation of NleB with TRADD and 
TRADD death domain in 293T cells. e, NF-«B luciferase activity of NleB- and/ 
or TRADD-transfected 293T cells is shown as fold change normalized to that in 
vector-transfected cells. f, Viability of NleB- and/or TRADD-transfected 293T 
cells was determined by measuring ATP levels. P < 0.0001 for all (TRADD plus 
NleB versus TRADD plus DXD, versus TRADD plus Asp221Ala, and versus 
TRADD plus Asp223Ala). For b, e and f, n = 3; data are mean = s.d.; P value 
determined by Student’s t-test. Data in a, cand dare representative from at least 
three repetitions. 


abolished *H-GlcNAc labelling of TRADD by NleB (Fig. 2d and 
Supplementary Fig. 8). TRADD purified from NleB-expressing cells 
was resistant to further in vitro *H-GIcNAc labelling by recombinant 
NleB in the ‘back glycosylation’ assay” (Fig. 2e), confirming the com- 
plete modification of TRADD in NleB-expressing 293T cells. Thus, NleB 
is a glycosyltransferase that GlcNAcylates the TRADD death domain. 
Extensive mutagenesis of potentially catalytic residues identified 
NleB Asp221Ala and Asp223Ala mutants that lost the activity of inhi- 
biting TNF-«-NF-«B signalling (Supplementary Fig. 9). Interestingly, 
the GT-A family of glycosyltransferases”, including the large clostri- 
dial toxins that modify Rho/Ras small GTPases”, feature a catalytic 
Asp-X-Asp (DXD) motif that coordinates manganese (Mn**) and/or 
the sugar donor. The NleB(Asp221Ala/Asp223Ala) (DXD) mutant 
could not block the death-domain-mediated TRADD interaction 
with TNFRlc and FADD (Supplementary Fig. 5b, c) or TRADD 
self-oligomerization (Fig. 2a). Recombinant NleB DXD mutant exhi- 
bited no catalytic activity of GlcNAcylating TRADD (Fig. 2d); these two 
aspartate residues were required for NleB interaction (Supplementary 
Fig. 3a) and modification of TRADD in transfected 293T cells (Fig. 2e), 
as well as for inhibiting TRADD-induced apoptosis (Fig. 1f and Sup- 
plementary Fig. 4a). Concurrent to our analysis, another study” also 
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Figure 2 | NleB GlcNAcylates the TRADD death domain and disrupts its 
oligomerization. a, Effects of NleB transfection on TRADD oligomerization in 
293T cells. Shown are immunoblots of cell lysates loaded onto native (top) and 
SDS-PAGE (bottom) gels. HA, haemagglutinin; WT, wild type. b, Effects of 
NleB transfection on TNFR1 complex formation in HeLa cells. Ub(m)-RIPK1 
denotes polyubiquitinated RIPK1; asterisk marks a nonspecific band. GST, 
glutathione S-transferase. c, Electrospray ionization (ESI)-mass spectrometry 
determination of the total mass of TRADD death domain immunopurified 
from NleB- or vector-transfected 293T cells. Single and double asterisks denote 
the acetylated TRADD death domain. d, In vitro *H-UDP-GIcNAc labelling of 
TRADD by recombinant NleB. LFn-NleB refers to NleB protein fused 
C-terminally to the N-terminal domain of anthrax lethal factor. e, TRADD 
immunopurified from NleB-transfected 293T cells was subjected to further in 
vitro *H-UDP-GlcNAc labelling by recombinant NleB. Data in a, b, d and e are 
representative from at least three repetitions. 


proposes that NleB is a GT-A-type glycosyltransferase according to 
fold recognition prediction. This study identifies glyceraldehyde-3- 
phosphate dehydrogenase (GAPDH) as a host binding partner of 
NleB. In O-GlcNAc antibody blotting, *H-GIcNAc labelling and mass 
spectrometric assays, GAPDH was not glycosylated by NleB in vitro and 
in vivo (Supplementary Fig. 10). GAPDH binding may represent a regu- 
latory mechanism of NleB function, or be a target of NleB homologue. 
EPEC contains an NleB paralogue known as NleB2 (E2348C_1041) and 
an NleB homologue, SseK1, is present in Salmonella enterica serovar 
Typhimurium. Although SseK1 functioned comparably to NleB in 
inhibiting TNF-o-NF-«B signalling and GlcNAcylating the TRADD 
death domain, NleB2 showed a lower enzymatic activity and only a 
fraction of TRADD was modified by NleB2 in 293T cells (Supplemen- 
tary Fig. 11). 

Protein O-GlcNAcylation is an O-linked GlcNAc modification gene- 
rally occurring on serine/threonine residues of intracellular proteins. 
O-GlcNAcylation is abundant from bacteria to mammals, and has been 
proposed to bea phosphorylation-like modification in regulating trans- 
cription or signalling in response to nutrients and cellular stresses**”?. 
The 203-Da mass increase indicates a single GlcNAc modification to 
the TRADD death domain by NleB (Fig. 2c and Supplementary Fig. 7a). 
To pinpoint the exact modification site, mutagenesis analyses were first 
used given that mapping the O-GlcNAcylation site by the conventional 
collision-assisted dissociation (CAD) mass spectrometry is technically 
difficult owing to the labile nature of GlcNAc modification”’. Surpri- 
singly, no single mutation in all the ten serine/threonine residues in 
the TRADD death domain disrupted its modification by NleB (Sup- 
plementary Fig. 12). The ST10A mutant (all ten serine/threonine residues 
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were mutated simultaneously) was still GlcNAcylated to a considerable 
extent (Supplementary Fig. 12). Thus, NleB-induced modification may 
not be a canonical O-GlcNAcylation. The NleB-modified TRADD 
death domain (Supplementary Fig. 13) was then subjected to in-depth 
mass spectrometry analyses. Among the 13 tryptic peptides that cover 
the large majority of the TRADD death domain sequence, two over- 
lapping ones (232-KVGRSLQR-239 and 233-VGRSLQR-239) showed 
a 203-Da mass increase (Supplementary Fig. 14). Electron-transfer 
dissociation (ETD) tandem mass spectrometry, which works better 
in preventing loss of the labile GlcNAc during peptide fragmentation”, 
revealed that the 203-Da mass increase occurred on Arg 235 (Fig. 3a). 


Figure 3 | NleB GlcNAcylates Arg 235 that is required for TRADD 
function. a, ETD-tandem mass spectrum of a triply charged Arg-235- 
containing tryptic peptide from NleB-modified TRADD death domain in 
bacteria. The fragmentation patterns that generate the observed c and z ions are 
illustrated along the peptide sequence shown on top of the spectrum. b, ESI- 
mass spectrometry determination of the total mass of TRADD arginine 
mutants immunopurified from NleB- or vector-transfected 293T cells. All 
arginine residues in the TRADD death domain were individually mutated into 
alanine (also see Supplementary Fig. 17). Asterisk denotes the acetylated 
TRADD death domain Arg235Ala mutant. ¢, In vitro 3H-UDP-GIcNAc 
labelling of TRADD death domain arginine mutants by recombinant NleB. 
d, TRADD death domain or its Arg235Ala mutant immunopurified from 
NleB-transfected 293T cells was subjected to chemoenzymatic UDP-GalNAz 
labelling followed by reaction with an alkyne-biotin derivative. e, f, TRADD or 
its Arg 235 mutants were expressed in TRADD-knockdown 293T cells. e, NF- 
kB luciferase activity (fold change) is normalized to that in vector-transfected 
cells (n = 3; mean + s.d.; Student’s t-test). Data in c, d and f are representative 
from at least three repetitions. 


When the Arg-235-containing tryptic peptide was purified and sequen- 
tially digested by proteinase K and carboxypeptidase, the GlcNAc- 
modified dipeptide (234-Gly-Arg-235) and even Arg 235 alone were 
identified by higher-energy collision dissociation tandem mass spectro- 
metry (Supplementary Fig. 15). When the modification was performed 
in cells metabolically labelled with 2-(acetyl-d;-amino)-2-deoxy- 
1,3,4,6-tetra-O-acetyl-D-glucopyranose (AcyGlcNAc-d;), a mass increase 
of 206 Da, corresponding to modification by the deuterium-labelled 
GlcNAc, occurred on Arg235 in the TRADD death domain (Sup- 
plementary Fig. 16). 

NleB GlcNAcylation of TRADD Arg235 was supported by four 
additional evidences. Arg235Ala and Arg235Lys mutations in the 
TRADD death domain abolished the GlcNAc modification by NleB 
in 293T cells, whereas mutation in any of the other ten arginine 
residues had no such effect (Fig. 3b and Supplementary Fig. 17). The 
purified TRADD(Arg235Ala) mutant resisted NleB-catalysed in vitro 
GlcNAcylation (Fig. 3c). When chemoenzymatic labelling of the termi- 
nal GlcNAc was used to detect GlcNAcylation, the wild-type TRADD 
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death domain from NleB-expressing 293T cells was readily labelled with 
N-azidoacetylgalactosamine (GalNAz) by an engineered galactosyal- 
transferase (Tyr289Leu mutant of GalT1), whereas no evident GalNAz 
labelling occurred on TRADD(Arg235Ala) (Fig. 3d). Lastly, NleB- 
catalysed GlcNAc modification was not reversed by O-GlcNAcase, B-N- 
acetylglucosaminidase and f-N-acetylhexosaminidases (a recombinant 
protein fusion of B-N-acetylhexosaminidase and maltose binding 
protein) (Supplementary Fig. 18), enzymes that are capable of remov- 
ing O-linked or terminal GlcNAc. 

TRADD-knockdown 293T cells were generated (Supplementary 
Fig. 19a). Knockdown of endogenous TRADD minimized its hetero- 
oligomerization and interference with transfected TRADD mutants. 
In contrast to wild-type TRADD, expression of TRADD(Arg235Ala) 
in TRADD-knockdown cells was largely deficient in activating the 
NF-«B signalling (Fig. 3e and Supplementary Fig. 19b). Consist- 
ently, lysine or alanine substitution of Arg 235 in TRADD completely 
or nearly completely disrupted TRADD self-oligomerization, and the 
residual Arg235Ala oligomer was insensitive to further NleB express- 
ion (Fig. 3f). Thus, Arg 235 is important for TRADD function and 
activity. 

Arg 235 is conserved in one-third of the total of more than 30 death- 
domain-containing proteins in humans” (Supplementary Fig. 20), 
including FADD, TNFRI1, RIPK1, FAS and death receptor-3/4/5 
(DR3/4/5) that function in death receptor signalling. NleB completely 
modified the death domains of TNFR1 and RIPK1 as well as full- 
length FADD in bacteria (Supplementary Fig. 21) or 293T cells (Sup- 
plementary Fig. 22); a considerable portion of FAS death domain was 
also GlcNAc-modified. Death domains of MYD88 and IRAK1 devoid 
of the conserved arginine were not modified by NleB (Supplementary 
Fig. 22). In the 7#H-GlcNAc labelling assay, FADD and death domains 
of TNFR1 and RIPK1 were GlcNAcylated by recombinant NleB, with 
efficiency comparable to that of the TRADD death domain (Fig. 4a). 
Induction of apoptosis by FAS ligand (FasL) and TRAIL require death- 
domain-mediated FADD interaction with the receptors and formation 
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of a caspase-8-containing death-inducing signalling complex (DISC). 
Consistently, anti-FAS antibody and TRAIL-stimulated HeLa cell 
apoptosis was inhibited by NleB, but not by the GlcNAc-transferase- 
deficient DXD mutant or the NleE effector (Fig. 4b). 

293T cells expressing various death domains or death domain 
proteins were infected with an NleB-proficient or -deficient EPEC 
strain. FADD, RIPK1 death domain and a portion of TRADD death 
domain from NleB-positive infection showed a 203-Da mass increase 
(Fig. 4c). Modification of these proteins was not observed with the 
NleB-deficient strain. Importantly, no GlcNAc modification occurred 
on TRADD(Arg235Ala) (Fig. 4c). Consistent with these observations, 
expression of NleB, but not the DXD mutant, in a NleB/E double- 
deletion EPEC strain blocked p65 nuclear translocation in infected 
mouse embryonic fibroblast (MEF) cells (Fig. 5a and Supplementary 
Fig. 23a). GlcNAc-transferase-active NleB inhibited apoptosis in EPEC- 
infected HeLa cells after stimulation by TNF-a, FasL or TRAIL (Fig. 5b 
and Supplementary Fig. 4b). Type-II-delivered NleB, but not the 
DXD mutant, also effectively blocked TNF-a-induced necroptosis in 
HeLa-RIP3 or HT-29 cells (Supplementary Fig. 24). Furthermore, 
TNF-a-induced recruitment of TRADD, TRAF2, SHARPIN and ubi- 
quitinated RIPK1 to TNFR1 as well as TRAIL-induced DISC formation 
were both disrupted by EPEC expressing the catalytically active NleB, 
and disruption of DISC formation was more severe (Fig. 5c and 
Supplementary Fig. 23b). When expressed in the type III secretion- 
deficient AescN strain, NleB did not affect cell death and death receptor 
complex assembly (Fig. 5a—c and Supplementary Figs 23b and 24). 

Deletion of nleB from Citrobacter rodentium results in severely 
reduced bacterial colonization in infected mice'’. C. rodentium NleB 
(NleBc) inhibited TNF-a signalling to an extent comparable to that of 
EPEC NleB (Supplementary Fig. 25a); NleBc also GlcNAcylated seve- 
ral death domain proteins, including FADD, TNFRI1 and RIPK1, des- 
pite a slightly different substrate preference (Supplementary Fig. 25b). 
In C.-rodentium-inoculated C57BL/6 mice, the AnleB mutant showed a 
significantly reduced colonization compared to the wild-type strain, 
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Figure 5 | Disrupting several death receptor pathways by NleB GlcNAc 
transferase activity that is required for bacterial colonization in vivo. a, MEF 
cells infected with indicated EPEC strains were stimulated with TNF-a, and 
statistics of cells with nuclear-localized p65 are shown. Representative 
fluorescence images are in Supplementary Fig. 23a. b, HeLa cells infected with 
indicated EPEC strains were stimulated with TNF-«, FasL or TRAIL, and cell 
viability was determined by measuring ATP levels. For a and b, n = 3; 

mean + s.d. c, Lysates of EPEC-infected HeLa cells stimulated with TRAIL 
were subjected to anti-caspase-8 (CASP8) immunoprecipitation. Data 


represent at least three repetitions. d, e, Five-six-week-old C57BL/6 mice were 
oral gavaged with indicated C. rodentium DBS100 strains. Viable stool bacterial 
counts (d), measured at indicated points after inoculation, are shown as 
mean ~ s.e.m. of log)) colony-forming units (CFU) per gram faeces, and 
bacterial colonization of the intestine 8 days after infection (e) is shown as the 
mean ~ s.e.m. of log CFU per gram colon (n > 6). **P< 0.01 (Student’s 
t-test). pNleB(c) and pNleB(c)-DXD are rescue plasmids expressing wild-type 
and the DXD mutant of NleB(c), respectively. 
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demonstrated by colony-forming units of bacteria recovered from stool 
samples and colons from infected mice (Fig. 5d, e). Complementation 
of AnleB with NleBc or EPEC NleB, but not the DXD mutant, recovered 
stool counts to the level of the wild-type strain and restored bacterial 
colonization in the intestinal tract of mice (Fig. 5d, e and Supplementary 
Fig. 26). Thus, the arginine GlcNAc transferase activity of NleB is 
crucial for bacterial colonization and virulence in mice. 

We discover that the EPEC type III effector NleB targets several 
death-domain-containing proteins in TNFR, FAS and TRAIL death 
receptor complexes to block host cell death. NleB probably functions 
as part of a network of effectors during EPEC interaction with the host. 
NleB is the first bacterial virulence factor hijacking the death receptor 
complex. Such distinguished feature may account for the unique viru- 
lence activity of NleB'*"'*. TRADD and FADD are central adaptors in 
death receptor signalling. TRADD also mediates Toll-like receptor 
signalling in macrophages'”*!, suggesting an even broader function 
of NleB in EPEC modulation of host defences. NleB modifies its death 
domain targets by GlcNAcylation on a conserved arginine. Discovering 
arginine GlcNAcylation is conceptually unexpected despite one early 
preliminary study proposing such modification on a corn protein’®. 
The labile GlcNAc modification is refractory to conventional mass- 
spectrometry-based proteomic identifications that are performed in 
the absence of an arginine GlcNAcylation hypothesis. Thus, arginine 
GlcNAcylation might be widely used and represent a previously unap- 
preciated mechanism in signalling. 


METHODS SUMMARY 


The EPEC E2348/69 AnleB/E SC3909 strain (AIE2::kan and nleBE IE6::tet)’ was 
used for cell culture infection. A single bacterial colony was inoculated into 0.5 ml of 
LB medium and statically cultured overnight at 37 °C. Bacterial cultures were then 
diluted by 1:40 in DMEM supplemented with 1 mM isopropyl-B-p-thiogalactoside 
(IPTG) and cultured for an additional 4h at 37°C in the presence of 5% COb. 
Infection was performed at a multiplicity of infection of 200:1 in the presence of 
1mM IPTG for 2 h. Cells were washed four times with PBS and bacteria were killed 
by 200 pg ml”! gentamicin. To assay NleB-induced modification, 293T cells were 
transfected with pCS2-Flag-TRADD death domain plasmids 24h before infection. 
z-VAD (Sigma) (20 1M) was added to inhibit apoptosis. Lysates of infected cells 
were subjected to anti-Flag immunoprecipitation to purify the TRADD death 
domain for mass spectrometry analysis. All independent experiments carried out 
in this study and indicated in the figure legends were biological replicates. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 
Plasmid, antibodies and reagents. DNA for NleB and NleB homologue genes 
was amplified from genomic DNA of EPEC E2348/69, C. rodentium DBS100 and 
S. enterica Typhimurium LT2 strains, and inserted into pCS2-EGFP, pCS2-HA 
and pcDNA4-Flag-HA vectors for mammalian expression, and pGEX-6P-2, 
pET28a-LFn and pSUMO for expression in E. coli. NleB DNA was also ligated 
into the pTRC99A vector for complementation in EPEC (under the tre promoter) 
and pET28a for complementation in C. rodentium (under the C. rodentium nleB 
promoter). cDNAs for TRADD, RIPK1, TRAF2/5/6, IKK-«/B/y, TAK1, TAB/23 
and IkB-o were amplified from a HeLa cDNA library as previously described'*”". 
TRADD death domain refers to residue 195-312 of human TRADD. cDNAs for 
cIAP1/2 and TNFRI were gifts from X. Wang and H. Wu, respectively. cCDNAs for 
FAS, FADD, IRAK1, GAPDH, MYD88 and OGT were amplified from human 
ultimate ORF clones (Invitrogen). For mammalian expression, cDNAs were cloned 
into pCS2-EGFP and pCS2-3Flag vectors. DNA encoding residues 31-624 of 
O-GlcNAcase was amplified from the Clostridium perfringens genome, and cloned 
into the pGEX6p-2 vector for recombinant expression. For yeast two-hybrid ana- 
lysis, DNAs encoding NleB, TAK1 and NleE were cloned into the bait vector 
pGBKT7; cDNAs for TRADD, RIPK1, TRAF2/5, IKKa/B/y, TAK1, TABI, 
cIAP1/2 and FADD were cloned into the prey vector pGAD-GH. The yeast two- 
hybrid interaction assay was performed using the Matchmaker 2-hybrid system 
(Clontech) following the manufacturer’s instruction. All single point mutations 
were generated by QuickChange site-directed mutagenesis kit (Stratagene), and mul- 
tiple point mutations and truncation mutants were generated by standard molecular 
biology procedures. NF-«B reporter plasmids were previously described’. Plasmids 
were prepared by GoldHi endofree plasmid maxi kit (CW2104, Beijing CoWin 
Bioscience) for transfection. All plasmids were verified by DNA sequencing. 
Antibodies for IkB-o (44D4, catalogue number 4812), p-IkB-« ($32) (14D4, 2589), 
PARP (9542), caspase-8 (1C12, 9746, for immunoblotting) and caspase-3 (9662) 
were purchased from Cell Signaling Technology. Antibodies for GFP (sc-8334), 
p65 (sc-372), TNERI (H271, sc-7895), TRAF2 (C-20, sc-876), TRAF6 (H274, 
sc-7221), haemagglutinin probe (Y-11, sc-805), caspase-8 (C-20, sc-6136, for immu- 
noprecipitation), FADD (H-181, sc-5559) and TAK] (sc-7162) were all from Santa 
Cruz Biotechnology. Antibodies for TRADD (610573) and RIPK1 (610459) were 
from BD Transduction Laboratories. Flag (M2), tubulin and actin antibodies were 
from Sigma. Horseradish peroxidase (HRP)-conjugated anti-biotin antibody was 
from Abcam (ab19221); p-TAK1 (Thr187) antibody was a gift from H. Sakurai; 
antibodies for HOIL-1L (2E2), HOIP (N1) and SHARPIN were provided by 
K. Iwai. Homemade SmacM compound was described previously’. Caspase-8 pep- 
tide (FFIQACQGDNYQKGIPVETD) was commercially synthesized (SciLight- 
Peptide). Cell culture products were from Invitrogen, all other reagents were Sigma 
unless noted. 
Cell culture and luciferase reporter assay. 293T and HeLa cells obtained from 
the American Type Culture Collection (ATCC) and MEF cells from S. Ghosh were 
grown in DMEM (HyClone) supplemented with 10% FBS, 2mM t-glutamine, 
100 Um!’ penicillin and 100 mg ml! streptomycin. HT-29 cells (ATCC) were 
maintained in McCoy’s 5A media supplemented with 10% FBS and 2mM 
L-glutamine. Cells were cultivated in a humidified atmosphere containing 5% 
CO2 at 37°C. Vigofect (Vigorus) was used for transfection following the manu- 
facturer’s instructions. Luciferase activity was determined 24h after transfection 
by using the dual luciferase assay kit (Promega) according to the manufacturer’s 
instructions. Detailed procedures were previously described*’**. One-hour CHX 
pretreatment (1 ug ml |) was used to sensitize TRAIL (200 ng ml ') and FAS 
antibody (1 pg ml * FAS) stimulation of apoptosis, and the concentration of CHX 
used for TNF-« (10 ng ml ') was 4 ug ml ' SmacM (100 pM) and z-VAD (20 1M; 
Sigma) were used to induce necroptosis. 
Stable cell-line construction. To generate NleB stable expression cells, empty 
pCDNA4 vector or pCDNA4-NleB plasmid was transfected into 293T cells. 
Forty-eight hours after transfection, cells were subcultured in the complete 
DMEM medium supplemented with 100g ml’ zeocin (Invitrogen). Two-to- 
three weeks later, clones were lifted and tested for expression of the transgene. To 
generate TRADD stable knockdown cells, pLKO.1 or pLKO.1-shTRADD (TRC 
number TRCN0000008020, Sigma) was transfected into 293T cells. Forty-eight 
hours after transfection, cells were subcultured in the complete DMEM medium 
containing 1 pg ml! puromycin. Two-to-three weeks later, clones were lifted, the 
culture was expanded, and expression of endogenous TRADD was tested by 
immunoblotting analysis. The HeLa-RIP3 stable cell line used for TNF-o induc- 
tion of necroptosis was generated as previously described**** and maintained in 
DMEM medium supplemented with 10% tetracycline-free FBS, 2 mM L-glutamine, 
100 U ml" penicillin, 100 mg ml“! streptomycin, 10 pg ml’ blasticidin and 1 mg 
ml | G418. 
Immunoprecipitation and receptor complex pulldown. For co-immunopreci- 
pitation, 293T cells at a confluency of 60-70% in 6-well plates were transfected 
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with a total of 5 yg plasmids. Twenty-four hours after transfection, cells were washed 
once in PBS and lysed in buffer A containing 25 mM Tris-HCl, pH7.6, 150 mM 
NaCl, 10% glycerol and 1% Triton X-100, supplemented with a protease inhibitor 
mixture (Roche Molecular Biochemicals). Pre-cleared lysates were subjected to anti- 
Flag M2 immunoprecipitation following the manufacturer’s instructions. The beads 
were washed four times with lysis buffer and the immunoprecipitates were eluted by 
2X SDS sample buffer followed by standard immunoblotting analysis. All the immu- 
noprecipitation assays were performed more than three times and representative 
results are shown. 

To purify death domains or death-domain-containing proteins for mass spectro- 
metry, 293T cells overexpressing death domain or death-domain-containing proteins 
were collected in buffer B containing 50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 20 mM 
n-octyl-B-D-glucopyranoside (INALCO) and 5% glycerol, supplemented with an 
EDTA-free protease inhibitor mixture (Roche Molecular Biochemicals). Cells were 
lysed by ultrasonication. The supernatant was pre-cleared by protein G-sepharose at 
4°C for 1h and subjected to anti-Flag (M2) immunoprecipitation. After 4h incuba- 
tion, the beads were washed once with buffer B and then four times with TBS buffer 
(50mM Tris-HCl, pH 7.5, and 150mM NaCl). Bound proteins were eluted with 
600 pg ml’ Flag peptide (Sigma) in TBS buffer. The eluted protein was verified by 
Commassie brilliant blue staining on an SDS-PAGE gel before mass spectrometry 
analysis. 

For TNER complex pulldown, HeLa cells were treated with 1 pg ml~' recom- 
binant GST-TNF-« for indicated lengths of time (for t = 0, GST-TNF-o was added 
after the cells were lysed). Cells were lysed in the GST pulldown buffer containing 
25mM Tris-HCl, pH7.6, 150mM NaCl, 25 mM f-glycerophosphate, 1 mM 
sodium orthovanadate, 10% glycerol, 0.5 mM dithiothreitol (DTT), 1 mM phenyl- 
methylsulphonyl fluoride (PMSF) and 1% Triton X-100, supplemented with the 
protease inhibitor mixture. Total cell lysates were incubated with glutathione 
sepharose 4B beads and mixed at 4°C for 4 h. The beads were washed once with 
PBS plus 1% Triton X-100, and twice with PBS plus 0.5% Triton X-100. Bead-bound 
proteins were analysed by immunoblotting using indicated antibodies. For DISC 
complex pulldown, HeLa cells infected with indicated EPEC strains were stimulated 
with 200ngml~' TRAIL for 2h. Anti-caspase-8 immunoprecipitation was then 
performed according to published literatures with minor modifications*”*. Cells 
were lysed in DISC immunoprecipitation buffer (50 mM Tris-HCl, pH 7.5, 150 mM 
NaCl, 10% glycerol and 1% Triton X-100) supplemented with 1% protease inhi- 
bitor, 10mM NaF, 2mM Na3VOy,, 0.1 mM PMSF and 1 mM EDTA. Caspase-8 
antibody (1.5 |g) was coupled to 20 ul protein G-sepharose in TBS (supplemented 
with 7mgml ' BSA). The pre-coupled sepharose was then washed and incubated 
with 2 mg of cell lysates overnight at 4 °C. Following extensive washes with DISC 
immunoprecipitation buffer (four times) and TBS (once), the immunocomplex 
were eluted using 40 ull of TBS containing 1 mg ml’ caspase-8 peptide and 1% 
protease inhibitor at room temperature for 1h. The eluted proteins were analysed 
by SDS-PAGE and immunoblotting. 

Immunofluorescence. For immunofluorescence, rhodamine-phalloidin (Invitro- 
gen) staining of actin in transfected HeLa cells or pedestal in EPEC-infected HeLa 
cells and p65 staining of NF-«B activation in TNF-c-treated HeLa cells were 
performed as previously described*!”’. 

Expression and purification of recombinant proteins. Protein expression was 
induced in E. coli BL21 (DE3) strain (Novagen) at 23°C for 15h with 0.4mM 
isopropyl-/}-D-thiogalactopyranoside (IPTG) after absorbance at 600 nm (Ago0 nm) 
reached 0.8-1.0. Affinity purification of GST-TRADD death domain or 6X His- 
SUMO-NIeB and LFn-NleB proteins was performed using glutathione sepharose 
(GE Healthcare) and Ni-NTA agarose (Qiagen), respectively, following the manu- 
factures’ instructions. GST-TRADD death domain was further purified by ion 
exchange chromatography and concentrated in a buffer containing 20 mM HEPES, 
pH7.5, 150 mM NaCl and 5% glycerol. 

Native PAGE. Blue native gel electrophoresis was performed to examine TRADD 
oligomerization using the NativePAGE Bis-Tris gel system from Invitrogen. In 
brief, Flag-TRADD was transfected into intact 293T or TRADD knockdown 293T 
cells for 24h. Transfected cells were washed twice with cold PBS and lysed in 1% 
digitonin-containing native lysis buffer (50mM Bis-Tris, pH 7.2, 50 mM NaCl, 
10% (w/v) glycerol, 0.001% Ponceau S, 1% digitonin, 2mM Na;VO, and 25 mM 
NaF) supplemented with the EDT A-free protease inhibitor cocktail. Cell lysis was 
performed on ice for 30min, and cell debris was removed by centrifugation 
(16,000g, 20min) at 4°C. Lysates were separated by NativePAGE using the 
Novex Bis-Tris gel system (Invitrogen). Native gels were soaked in 10% SDS for 
5 min before transfer to PVDF membrane (Millipore) for immunoblotting analysis. 
Chemoenzymatic labelling-based GlcNAcylation detection. To detect NleB 
GlcNAcylation of TRADD death domain in 293T cells, immunopurified TRADD 
death domain was chemoenzymatically labelled using the Click-iT O-GlcNAc 
enzymatic labelling system (Invitrogen) and GlcNAc modification was detected 
by Click-iT protein analysis detection kits (Invitrogen) following the manufacturer’s 
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protocol. In brief, 1 X 10° 293T cells were transfected with indicated plasmids and 
anti-Flag immunoprecipitation was carried out as described earlier. The immuno- 
precipitates (~200 1g) were eluted with 1% SDS in 20 mM HEPES, pH 7.9, at 95 °C 
for 5 min, and then subjected to UDP-GalNAz labelling using the mGalT1 enzyme 
(Tyr289Leu mutant of galactosyltransferase GalT1) and a biotinylated-alkyne/azide 
click chemistry conjugation. The GlcNAc modification was detected by immuno- 
blotting analysis using an anti-biotin antibody. 

In vitro *H-UDP-GIcNAc labelling. Flag-TRADD or other death domain pro- 
teins expressed in 293T cells were immunopurified and immobilized on the Flag 
M2 beads. The beads were incubated with 5 jg NleB for 2 h at 37 °C in 40 ll buffer 
containing 20 mM HEPES, pH 7.5, 100 mM KCl, 2mM MgCh, 1 mM MnCl, and 
0.4 Ci (0.2 uM) of #H-UDP-GlcNAc (Perkin Elmer). The reaction mixtures were 
separated on a 12% SDS-PAGE gel followed by Coommasie blue staining. 
Incorporation of *H-UDP-GlcNAc was visualized by *H autoradiography. For 
the ligand competition, 10 1M cold UDP-activated sugars were included in the 
GlcNAcylation reaction. 

Liquid chromatography-mass spectrometry analysis. E. coli or 293T-cell puri- 
fied death domain proteins were loaded onto a homemade capillary column 
(150 um ID, 3-cm long) packed with Poros R2 media (AB-Sciex), and eluted by 
an Agilent 1100 binary pump system with the following solvent gradient: 0-100% 
Bin 60 min (A = 0.1 M acetic acid in water; B = 0.1 M acetic acid, 40% acetonitrile 
and 40% isopropanol). The eluted proteins were sprayed into a QSTAR XL mass 
spectrometer (AB-Sciex) equipped with a Turbo Electrospray ion source. The 
instrument was acquired in mass spectrometry mode under 5 K volts spray volt- 
age. The protein charge envelop was averaged across the corresponding protein 
elution peaks, and de-convoluted into non-charged forms by the BioAnalyst soft- 
ware provided by the manufacturer. 

MALDI-TOF mass spectrometry analysis of tryptic peptides. Purified GST- 
TRADD death domain proteins were in-solution cleaved by PreScission, and GST 
was depleted by glutathione sepharose 4B. TRADD death domain co-expressed 
with SUMO-NleB was completely soluble, whereas TRADD death domain 
expressed alone aggregated to some extent after PreScission cleavage. After cent- 
rifugation at 3,200g for 1 min to remove the beads, the supernatant containing 
TRADD death domain was digested with sequencing-grade modified trypsin 
(Promega) at 37 °C for at least 3 h in 50 mM ammonium bicarbonate. The digested 
peptide samples were analysed on an Autoflex II MALDI-TOF/TOF (matrix- 
assisted laser desorption/ionization-time of flight) mass spectrometer (Bruker) 
equipped with a nitrogen pulsed laser. In brief, equal volumes of the peptide 
samples and 2,5-dihydroxybenzoic acid solution (Agilent) were mixed together 
and spotted on a Bruker MTP 384 massive stainless steel MALDI target. The 
matrix spots were allowed to dry at room temperature and then washed on-target 
with 0.1% trifluoroacetic acid to remove salts and other water soluble contamina- 
tions. Peptide mass fingerprinting spectra were acquired in the positive reflector 
mode with pulsed ion extraction. 

To identify GlcNAcylated arginine and arginine-containing peptides, the tryp- 
sin-digested TRADD death domain (modified by NleB in bacteria) was separated 
on an Agilent Ellipse C18 reversed phase column (4.6 X 150mm) using the 
Agilent 1260 Infinity HPLC system. The HPLC gradient was as follows: 0-20% 
B in 10 min, 20-100% B in 20 min (solvent A = 10 mM ammonium acetate, solv- 
ent B = 10mM ammonium acetate in 80% acetonitrile). The ultraviolet detector 
wavelength was 215 nm and the fractionation size was 300 ul. Peptides from each 
fraction were detected by MALDI-TOF on Autoflex II mass spectrometer. The 
HPLC fraction containing peptide with m/z matching 233-VGR-GlcNAc.SLQR- 
239 with GlcNAc modification was concentrated to ~50 pil and then digested with 
proteinase K (New England Biolabs) overnight at 37°C. To obtain the single 
arginine with GlcNAc modification, the proteinase-K-treated peptides were 
digested further with carboxypeptidase A (Sigma) overnight at 37 °C. The digests 
were loaded into a nanoES emmiter and sprayed directly into LTQ Orbitrap Velos 
mass spectrometer equipped with a nanoESI ion source. The dipeptide GReicnac 
and the single arginine with a GlcNAc modification were manually selected for 
high-energy collision dissociation tandem mass analysis. 

ETD-MS analysis of GlcNAc-modified peptides. To determine the exact 
GlcNAcylation site on TRADD proteins after NleB modification, purified 
TRADD death domain protein co-expressed with SUMO-NleB in E. coli was 


trypsin digested and the resulting tryptic peptides were analysed on a LTQ 
Orbitrap XL mass spectrometer (Thermo Fisher) equipped with an ETD ion 
source and a nanoESI ion source. The peptide solution was loaded into a 
nanoES emitter and sprayed directly into the instrument. The peptides with m/z 
matched to calculated GlcNAc-modified TRADD peptides were manually selected 
for ETD fragmentation with a 2-Da mass selection window. 

Stable isotope labelling. Isotope-labelled AcyGlcNAc (2-(acetyl-d3-amino)-2- 
deoxy-1,3,4,6-tetra-O-acetyl-p-glucopyranose) (AcyGlcNAc-d3) was synthesized 
as previously described** and the spectroscopic data were consistent with that in 
the literature**. 293T cells (5 X 10°) were pretreated with 100 LM isotope-labelled 
AcyGlcNAc-d3 for 12h and transfected with Flag-TRADD death domain and 
NleB for another 12h. Flag~TRADD death domain was immunopurified as that 
for mass spectrometry measurement of the total mass and prepared into the 
50mM NH,HCOs;, pH 8.0, buffer. Flag peptide was depleted with Flag M2 beads 
and the supernatant was subjected to trypsin digestion at 37°C for 1h. The 
resulting tryptic peptides were analysed by ETD as described earlier. 

Mice infection and C. rodentium colonization assays. All animal experiments 
were conducted following the Ministry of Health national guidelines for housing 
and care of laboratory animals and performed in accordance with institutional 
regulations after review and approval by the Institutional Animal Care and Use 
Committee at National Institute of Biological Sciences. Male C57BL/6 mice, 5-6 
weeks old, maintained in the specific pathogen-free environment were used. All 
animals were housed in individually high efficiency particulate air (HEPA)-filtered 
cages with sterile bedding. Mice were randomized into each experimental group 
with no blinding. Independent experiments were performed using 7-8 mice per 
group. 

Deletion of the gene encoding NleBc in C. rodentium strain DBS100 (ATCC51459; 
ATCC) was generated by standard homologous recombination using the suicide 
vector pCVD442, as described previously”. The DNA of pNleBc was amplified by 
PCR from genomic DNA of DBS100 strain to include a putative upstream promoter 
sequence (~300 bp), using the forward primer 5'-TCAGGGCCGGCCGACTGG 
AACATATGCGGG-3’ and reverse primer 5'-TGACGGCGCGCCTTACCATGA 
ACTGTTGGTATACATACTG-3’. For oral inoculation and harvesting, C. roden- 
tium wild-type strain and represented derivatives were prepared by overnight shaking 
of bacterial culture at 37 °C in LB broth. Mice were orally inoculated using a gavage 
needle with 200 iil suspension of bacteria in PBS (~2 X 10° CFU). The number of 
viable bacteria used as the inoculum was determined by retrospective plating onto LB 
agar containing the appropriate antibiotics. Stool samples were recovered aseptically 
at various time points after inoculation, and the number of viable bacteria per gram of 
stool was determined after homogenization in PBS and plating onto LB agar 
containing the appropriate antibiotics. Eight days after inoculation, colons were 
removed aseptically, weighed and homogenized in PBS. Homogenates were seri- 
ally diluted and plated to determine CFU counts. Colonization data were analysed 
using a Student’s t-test in the commercial software GraphPad Prism. P < 0.05 was 
considered significant. 
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Successful infection by enteric bacterial pathogens depends on the 
ability of the bacteria to colonize the gut, replicate in host tissues and 
disseminate to other hosts. Pathogens such as Salmonella, Shigella 
and enteropathogenic and enterohaemorrhagic (EPEC and EHEC, 
respectively) Escherichia coliuse a type III secretion system (T3SS) to 
deliver virulence effector proteins into host cells during infection 
that promote colonization and interfere with antimicrobial host 
responses’ *, Here we report that the T3SS effector NleB1 from 
EPEC binds to host cell death-domain-containing proteins and 
thereby inhibits death receptor signalling. Protein interaction studies 
identified FADD, TRADD and RIPK1 as binding partners of NleB1. 
NleB1 expressed ectopically or injected by the bacterial T3SS pre- 
vented Fas ligand or TNF-induced formation of the canonical 
death-inducing signalling complex (DISC) and proteolytic activation 
of caspase-8, an essential step in death-receptor-induced apoptosis. 
This inhibition depended on the N-acetylglucosamine transferase 
activity of NleB1, which specifically modified Arg 117 in the death 
domain of FADD. The importance of the death receptor apoptotic 
pathway to host defence was demonstrated using mice deficient in the 
FAS signalling pathway, which showed delayed clearance of the 
EPEC-like mouse pathogen Citrobacter rodentium and reversion 
to virulence of an nleB mutant. The activity of NleB suggests that 
EPEC and other attaching and effacing pathogens antagonize death- 
receptor-induced apoptosis of infected cells, thereby blocking a major 
antimicrobial host response. 

Members of the death receptor family such as FAS and TNFRI1 con- 
tain an intracellular death domain and require caspase-8 for the activa- 
tion of effector caspases and consequent cell death by the extrinsic 
apoptotic pathway*’. Within seconds of engagement by Fas ligand 
(FasL), pre-associated FAS trimers undergo multimerization and recruit 
the death-domain-containing adaptor protein FADD through hetero- 
typic death domain interactions’. FADD then recruits pro-caspase-8 
forming the death-inducing signalling complex (DISC), whereupon 
caspase-8 undergoes conformational change, auto-proteolytic proces- 
sing and release from the DISC to allow cleavage of cytosolic substrates, 
including pro-caspase-3 and Bid’. Similarly, upon stimulation by TNF, 
TNFRI recruits the adaptor protein TRADD and the receptor kinase, 
RIPK1, leading to NF-«B activation. Alternatively, cell death may be 
induced through TNFR1 internalisation and formation of a secondary 
complex that contains TRADD/RIPK1/FADD and caspase-8 (refs 6, 7). 

Previously, we reported that the EPEC T3SS effector NleB1 inhi- 
bited IkB degradation in cells upon stimulation with TNF but not IL- 
1B, suggesting that NleB1 interfered with death receptor signalling*. To 
identify host cell binding partners of NleB1, we screened a yeast two- 
hybrid complementary DNA (cDNA) library derived from HeLa cells 
using NleB1 as bait. FADD and RIPK1, were recovered several times. 
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Figure 1 | NleB1 binds death domain proteins and inhibits caspase-8 
activation. a, Yeast two-hybrid analysis of protein-protein interactions in 

S. cerevisiae PJ69-4A. Results are mean + s.e.m. B-galactosidase activity from 
at least three independent experiments performed in triplicate. DD, death 
domain; DED, death effector domain. b, Growth of S. cerevisiae PJ69-4A on 
medium to select for protein-protein interactions (left panel) or plasmid 
maintenance (right panel). Representative images from at least three 
independent experiments. c, Yeast two-hybrid analysis of protein interactions 
in S. cerevisiae PJ69-4A. Results are mean + s.e.m. B-galactosidase activity from 
at least three independent experiments performed in triplicate. d, GFP-Trap of 
EGFP-NleB1 and detection of FADD-Flag, TRADD-Flag and RIPK1-Flag in 
HEK293T cells. Actin, loading control. Representative immunoblot from at 
least three independent experiments. e, MTT reduction in HeLa cells 
expressing EGFP, EGFP-NleB1 or EGFP-NleB2. UT, untransfected. Results 
are the mean + s.e.m. of absorbance at 540 nm from three independent 
experiments performed in triplicate. *P < 0.0001, unpaired, two-tailed t-test. 
f, Cleaved caspase-8 in HeLa cells expressing EGFP or EGFP-NleB1. p43/41 
and p18 are products of processed caspase-8. UT, untransfected. Representative 
immunoblot from at least three independent experiments. 
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Figure 2 | Enzymatic activity of NleB1. a, In vitro assay for NleB1 
N-acetylglucosamine (GlcNAc) modification of FADD using recombinant 
proteins and 1 mM UDP-GlcNAc. Representative immunoblot from at least 
three independent experiments. b, Intact protein mass spectrometry of FADD 
incubated with GST-NleB1 and UDP-GIlcNAc. ¢, High resolution collision- 
induced dissociation (CID) spectrum of the peptide corresponding to 
FADD'!*-”°. The * denotes diagnostic fragment ions that carry the GlcNAc 
modification. d, Cleaved caspase-8 in FasL-treated HeLa cells expressing EGFP, 
EGFP-NleB1 or EGFP-NleB1,,,. UT, untransfected. Actin; loading control. 
Representative immunoblot from at least three independent experiments. 

e, Cleaved caspase-8 in HeLa cells infected with derivatives of EPEC and treated 


Transformation into a different yeast strain demonstrated that the 
interaction of NleB1 with FADD and RIPK1 occurred through their 
respective death domains (Fig. la—c). Studies in transfected HEK293T 
cells showed that FADD-Flag, TRADD-Flag and RIPK1-Flag co- 
purified with an EGFP-NleB1 (EGFP, enhanced green fluorescent 
protein) fusion protein as well as a 2HA-tagged NleB1 translocated 
during EPEC E2348/69 infection (Fig. 1d and Extended Data Fig. 1a, 
b). The death domain of FADD was required for binding to NleB1 
from different attaching and effacing pathogens (Fig. 1d and Extended 
Data Fig. 1b). As TRADD and RIPK1 are critical components of the 
TNERI signalling complex, we investigated the influence of NleB1 on 
inhibition of the NF-«B dependent cytokine, IL-8, during EPEC infec- 
tion. Unlike NleE, a known inhibitor of NF-«B activation®’, neither 
NleB1 nor its close homologue NleB2 appeared to inhibit IL-8 pro- 
duction during EPEC infection (Extended Data Fig. 1c). Therefore we 
focused our attention on FADD-dependent apoptosis signalling. After 
treatment with FasL, HeLa cells transfected with pEGFP-NleB1 
showed significantly higher levels of MTT (3-(4,5-dimethylthiazol-2- 
yl)-2,5-diphenyltetrazolium bromide) reduction (mitochondrial acti- 
vity assay) compared to pEGFP transfected cells (Fig. le), suggesting 
that NleB1 blocked cell death. Immunoblot analysis with an antibody 
that recognizes only cleaved caspase-8, revealed that FasL-induced 
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with FasL. Representative immunoblot from at least three independent 
experiments. pNleB1, pTrc99A carrying nleB1. pNleB2, pTrc99A carrying 
nleB2. f, Quantification of cleaved caspase-8 by immunofluorescence 
microscopy of HeLa cells infected with derivatives of EPEC and treated with 
FasL. Results are mean + s.e.m. of the percentage of cells with cleaved caspase-8 
from two independent experiments counting ~200 cells in triplicate. 

*P < 0.0001 compared to uninfected, unstimulated control, one-way ANOVA. 
g, Immunofluorescence staining for detection of cleaved caspase-8 induced by 
FasL in HeLa cells infected with derivatives of EPEC. Scale bar, 10 pm. 
Representative images are shown from at least three independent experiments. 


caspase-8 cleavage was substantially reduced in HeLa cells transfected 
with pEGFP-NleB1 (Fig. 1f). By immunofluorescence microscopy, 
fewer cells expressing EGFP-NleB1 contained cleaved caspase-8 after 
treatment with FasL (Extended Data Fig. 1d, e). In contrast, a compar- 
able number of cells contained cleaved caspase-3 in untransfected, 
EGFP and EGFP-NleB1 expressing cells treated with tunicamycin 
(Extended Data Fig. le). This indicated that NleB1 had no effect on 
activation of the intrinsic apoptotic pathway. 

NleB from the murine attaching and effacing pathogen C. rodentium 
was recently described as an N-acetylglucosamine (GlcNAc) transfer- 
ase and a member of the glycogenin family of enzymes’®. Given the 
ability of NleB1 to bind FADD and inhibit proteolytic activation of 
caspase-8, we examined whether FADD was post-translationally modi- 
fied by NleB1. Following incubation with GST-NleBl and UDP- 
GlcNAc, we observed GlcNAc modification of His-FADD (Fig. 2a). 
This modification was not present upon incubation with an NleBl 
catalytic site mutant (NleBl,4)'°. Similar modification of FADD- 
Flag occurred upon ectopic expression of EGFP-NleB1 in HeLa cells 
(Extended Data Fig. 1f). Intact protein liquid chromatography—mass 
spectrometry (LC-MS) of His-FADD incubated with GST-NleB1 and 
UDP-GlcNAc revealed a mass shift matching a single GlcNAc modi- 
fication on FADD (Fig. 2b). Peptide sequencing of multiple spectra 
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Figure 3 | Inhibition of FasL-induced DISC formation and cell death by 
EPEC. a, Cleaved caspase-8 in HeLa cells infected with derivatives of EPEC. 
Actin; loading control. Representative immunoblot from at least three 
independent experiments. b, Cell death visualized by propidium iodide (PI) 
staining in HeLa cells infected with derivatives of EPEC and treated with FasL. 
Scale bar, 20 tum. Representative images shown from at least three independent 
experiments. c, Quantification of PI staining and microscopic analysis in HeLa 
cells infected with derivatives of EPEC and treated with FasL. Results are 
mean + s.e.m. of percentage cells with PI staining from two independent 
experiments counting ~200 cells in triplicate. *P < 0.0001 compared to E2348/ 
69 infected cells, one-way ANOVA. d, Cleaved and full-length caspase-8 in 
HeLa cells infected with derivatives of EPEC and treated with FasL. 
Representative immunoblot from at least three independent experiments. 

e, DISC components induced by FasL coupled to Fcy and precipitated with 
protein G beads. UN, untreated and uninfected; UI, uninfected. Representative 
immunoblot from at least three independent experiments. 


from in-gel digests unambiguously identified Arg117 as the site of 
N-acetylglucosylation (Fig. 2c, Extended Data Figs 2-3, Supplemen- 
tary Table 1). This was confirmed by substitution of Arg 117 in FADD 
with alanine, whereas alanine substitution at Ser 122 had no effect on 
NleB-mediated N-acetylglucosylation (Extended Data Fig. 4). Arg 117 
is located at the interface of the FAS-FADD death domain interaction 
and is critical for assembly of the FAS-FADD oligomeric complex and 
formation of the DISC’. Accordingly, EGFP-NleB1 but not catalyti- 
cally inactive EGFP-NleB1, 4, inhibited caspase-8 activation (Fig. 2d). 

During infection, NleB1 delivered by the EPEC T3SS inhibited 
FasL-induced caspase-8 activation in HeLa cells. This inhibition was 
lost in cells infected with an AescN mutant that has a non-functional 
T3SS, or an AnleB1/B2 double mutant (Fig. 2e-g). Complementation 
of the AnleB1/B2 mutant with nleB1 but not nleB2 restored the ability 
of EPEC to inhibit caspase-8 activation (Fig. 2e-g), demonstrating that 
the two NleB proteins have distinct functions. This mirrored our protein 
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interaction studies, which showed that NleB2 did not interact with 
FADD or TRADD and bound only weakly to RIPK1 (Extended Data 
Fig. 5a-d). 

Arg 117 in FADD is essential for FAS-FADD and TRADD-FADD 
death domain interactions''’*. Consistent with the role of NleB1 in 
inhibition of FADD-dependent caspase-8 activation, an nleB1 mutant 
was unable to inhibit FasL- or TNF-induced activation of caspase-8 
whereas the complemented strain was as effective as wild-type EPEC 
(Fig. 3a and Extended Data Fig. 5e, f). EPEC expressing NleB1 also 
inhibited FasL-induced apoptosis as measured by propidium iodide 
(PI) staining (Fig. 3b, c). These phenotypes were not due to differences 
in adherence (Extended Data Fig. 5g). 

NleB1 N-acetylglucosamine transferase activity was required for inhi- 
bition of FasL-induced caspase-8 processing during infection (Fig. 3d). 
To examine the influence of NleB1 on FAS DISC assembly, which pre- 
cedes caspase-8 activation, we immunoprecipitated the FAS receptor 
complex from infected HeLa cells treated with FasL (Fig. 3e). Prior 
infection of cells with EPEC substantially inhibited FasL-induced 
DISC formation and this required NleB1 N-acetylglucosamine transfer- 
ase activity (Fig. 3e). 

We next examined the effect of NleB on inhibition of caspase-8 
processing in vivo by infecting wild-type C57BL/6 mice with C. roden- 
tium or a AnleB mutant of C. rodentium and staining tissue sections 
for cleaved caspase-8. Although few cells were positive for cleaved 
caspase-8 during C. rodentium infection, significant numbers of cells 
were positive for cleaved caspase-8 during infection with the AnleB 
mutant (Fig. 4a). Many of these caspase-8 positive cells were present as 
sloughed cells in the gut lumen (Extended Data Fig. 6a). 

The observation that NleB1 inhibited FasL-induced caspase-8 cleav- 
age suggested that FAS-mediated apoptosis contributes to host defence 
against attaching and effacing pathogens. We tested this by infecting 
FAS-deficient Fas?’””" mutant mice with C. rodentium’. As shown 
previously, nleB was essential for efficient colonization of wild-type 
C57BL/6 mice’ (Fig. 4b). Although there was no difference in faecal 
shedding of C. rodentium between C57BL/6 and Fas?””" mice during 
the acute phase of infection (first 12 days) (Fig. 4b), Fas???" mice 
developed severe, watery diarrhoea (Fig. 4c). In addition, C. rodentium 
penetrated the crypts in Fas?”?" mice, unlike in C57BL/6 mice in 
which the bacteria were predominantly located at the tips of the villi 
(Fig. 4d). To study the resolving phase of infection (post day 12), C57BL/6 
and Fas?" mice were infected with a lower dose of C. rodentium to 
avoid the rapid onset of severe diarrhoea and monitored for up to 33 days. 
Fas®"’?” mice were impaired in their ability to clear C. rodentium com- 
pared to C57BL/6 mice (Fig. 4e and Extended Data Fig. 6b). 

The development of severe diarrhoea in C. rodentium-infected 
Fas'?”?" mice was unexpected and sections of colon from mice infected 
with C. rodentium for 10 days were examined to identify differences in 
pathology between Fas” mice and C57BL/6 mice. Histological ana- 
lysis of colons from Fas?”?" mice revealed greater erosion of the 
epithelium (tissue damage) as well as increased colon weight and crypt 
height compared to C57BL/6 mice (measures of C. rodentium induced 
intestinal hyperplasia?) (Extended Data Fig. 7a-d). In addition, 
Fas!" mice showed more pronounced neutrophil infiltration, which 
penetrated across the muscularis mucosa (Extended Data Fig. 7e). 
Similar phenotypes were observed in FasL-deficient Fas*”* mice 
and Bid-deficient mice (Fig. 4d, e and Extended Data Fig. 8). Bid is 
not required for FasL-induced apoptosis in lymphoid cells, but is 
required for FasL-induced apoptosis in several other cell types’*"®. 
This suggests that FAS signalling in epithelial cells rather than 
lymphoid cells limits C. rodentium-induced colitis. 

We reasoned that if NleB interfered with FAS-mediated apoptosis 
signalling during infection, then C. rodentium AnleB mutant bacteria 
would no longer be attenuated in mice that carry defects in this apoptosis 
pathway. Indeed, Fas?" mice infected with the AnleB mutant showed 
comparable levels of bacterial colonization and severity of diarrhoea 
compared to their littermates infected with C. rodentium (Extended 
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Figure 4 | Infection of mice deficient for FAS signalling with C. rodentium. 
a, Immunofluorescence staining and quantification of cleaved caspase-8 in 
colonic sections from C57BL/6 mice infected with C. rodentium (CR) or a 

C. rodentium nleB mutant. Representative images from at least three separate 
sections of colon at least 100 lm apart (transverse or longitudinal), per animal 
from five individual mice per group. Results are mean ~ s.e.m. of percentage 
fluorescence intensity of cleaved caspase-8 staining relative to Hoechst staining 
of five fields with C. rodentium infection from three mice per group. 

*P = 0.0001, unpaired two-tailed t-test. b, Bacterial load in the faeces during 
acute infection with C. rodentium. Each data point represents logy c.f.u. per 
100 mg faeces per individual animal on days 4, 8 and 12 after infection (c.f.u., 
colony forming units). Mean + s.e.m. are indicated, dotted line represents the 
detection limit. P values from Mann-Whitney U-test. c, Diarrhoea score at day 
4, 8 and 12 post-infection. The scoring system is described in the Methods. 


Data Figs 6b and 9a, b). The same was not true for AespI or AespF 
mutants of C. rodentium, which retained their known colonization 
defects in Fas’””" mice (Extended Data Fig. 9c, d)'”'®. EspI and EspF 
are T3SS effector proteins with functions unrelated to NleB1 and the 
inhibition of FAS signalling. 

During infection with attaching and effacing pathogens, the predom- 
inant site of NleB activity is in enterocytes where the translocation of 
bacterial effector proteins occurs’. NleB-mediated inhibition of FasL/ 
FAS-induced apoptosis may prolong bacterial attachment to host cells 
and thereby promote bacterial shedding in the faeces. Indeed NleB is 
associated with the transmission of C. rodentium amongst littermates’” 
and is epidemiologically associated with outbreaks of haemolytic 
uraemic syndrome caused by EHEC””’. Conversely, the reduced 
colonization of C57BL/6 mice by C. rodentium AnleB mutants and 
increased numbers of activated caspase-8 positive cells may reflect 
partial protection by the FAS apoptotic pathway through the elimina- 
tion of infected cells'*. This is consistent with the prolonged shedding 
of C. rodentium in FAS-deficient animals compared to C57BL/6 mice. 
The increased intestinal pathology observed in C. rodentium-infected 
FAS-, FasL- and Bid-deficient mice has also revealed a function for the 
FAS apoptotic pathway in limiting the extent of colitis during gastro- 
intestinal infection. This aggravation of disease may be mirrored in 
humans, as nucleotide polymorphisms in the FASLG gene, encoding 
FasL, have been associated with the development of inflammatory 
bowel disease”’. In summary, the discovery of a bacterial T3SS effector 
that specifically modifies death domain containing proteins through a 
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Mean = s.e.m. are indicated. P values from one-way ANOVA. 

d, Immunofluorescence staining and quantification of C. rodentium 
penetration into intestinal crypts in colonic sections from C57BL/6, Fas/'er 
and Fas*“8" mice. Representative images from at least two separate sections of 
colon at least 100 [m apart (transverse or longitudinal), per animal from five 
individual mice per group. Results are the mean + s.e.m. maximum distance of 
C. rodentium staining from the epithelial surface (in im) of five independent 
sections with at least three measurements per section. P = 0.0001, unpaired 
two-tailed t-test. e, Bacterial load in the faeces during the resolving phase of 
C. rodentium infection. Each data point represents log) c.f-u. per 100 mg faeces 
per individual animal on days 12, 16, 21, 23 and 25 post-infection. 

Mean <= s.e.m. are indicated, dotted line represents detection limit. P values 
from Mann-Whitney U-test. Scale bars, 100 um. 


newly identified post-translational modification and thereby inhibits 
death receptor mediated apoptosis has revealed a new virulence mech- 
anism in bacterial infection. 


METHODS SUMMARY 


Detailed information including bacterial mutant construction, immunofluores- 
cence, immunoblot details, mouse infections, histology and pathology scoring, 
imaging and image analysis, and associated references are included in the 
Methods. Protein interaction assays were investigated using the yeast two-hybrid 
system and co-immunoprecipitation. Activation of caspase-8 was assessed by 
immunoblot and immunofluorescence microscopy using antibodies specific for 
cleaved caspase-8. Cell death was examined by MTT reduction and propidum 
iodide staining. Post-translational modifications of FADD were identified by intact 
protein mass spectrometry and collision-induced dissociation (CID) and electron 
transfer dissociation (ETD) peptide fragmentation. All genetically modified and 
spontaneous mutant mice are on a C57BL/6 background and all animals were 
inoculated by oral gavage with 200 il containing approximately 1 X 10° or 1 x 10° 
c.f.u. of C. rodentium. All animal experimentation was approved by the University 
of Melbourne Animal Ethics Committee, applications 0808209 and 1112062. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Bacterial strains, plasmids, yeast strains and growth conditions. The bacterial 
strains, yeast strains and plasmids used in this study are listed in Supplementary Table 2. 
All PCR primers are listed in Supplementary Table 3. Bacteria were grown at 37 °Cin 
Luria-Bertani (LB) medium or Dulbecco’s modified Eagle’s medium (DMEM) 
(Gibco) where indicated and supplemented with ampicillin (100 pg ml’), kanamy- 
cin (100 pgml—'), nalidixic acid (50 1g ml!) or chloramphenicol (25 jg ml 1) when 
necessary. Yeast strains were grown at 30 °C in YPD (yeast extract/peptone/dextrose) 
medium or yeast nitrogen minimal medium supplemented with 2% glucose and 
amino acids including histidine (20 jug ml — 1), methionine (20 ug ml }), tryptophan 
(20 pg ml), adenine (20 ig ml7?), uracil (20 ug ml!) and leucine (30 ug ml?) 
when necessary. For infection of HeLa cells, overnight cultures of EPEC grown in 
LB medium were subcultured 1:50 into DMEM and grown statically for 3-4h at 
37°C with 5% CO . The optical density (Dgo0nm) of the bacterial cultures was 
measured to standardize the inoculum before infection. Cultures were induced with 
1 mM isopropyl-B-D-thiogalactopyranoside (IPTG) for 30 min before infection. 
Construction of NleB, TRADD, RIPK1 and FADD expression vectors. The 
plasmids and primers used in this study are listed in Supplementary Tables 2 and 3, 
respectively. The nleB1 gene from EPEC E2348/69 (GenBank accession CAS10779), 
nleB1 from EHEC 0157:H7 EDL933 (GenBank accession NP_289553.1) and nleB 
from C. rodentium (GenBank accession NC_013716.1) were amplified from geno- 
mic DNA by PCR using the primer pairs NleB1,/NleBlg, NleBlgy/NleBl gy or 
NleBcrr/NleBcrr, respectively and ligated into EcoRI/BamHI digested pEGFP-C2 
to generate N-terminal EGFP fusions to NleB (EGFP-NleB1, EGFP-NleBlguec, 
EGFP-NleBcr). The nleB2 gene was amplified from EPEC E2348/69 genomic 
DNA (GenBank accession CAS08589) using primer pairs NleB2;/NleB2p and 
ligated into EcoRI/BamHI digested pEGFP-C2. To generate the complementing 
vectors, pNleB1 and pNleB2, nleB1 and nleB2 were amplified from EPEC E2348/69 
genomic DNA by PCR using the primer pairs NleB1;/NleBlg and NleB2;/NleB2, 
and ligated into EcoRI/BamHI digested pTrc99A. To create pNleB1-2HA and 
pNleB2-2HA, nleB1 or nleB2 was amplified from EPEC E2348/69 genomic DNA 
by PCR using the primer pairs NleBlp/NleBlg-2y4 and NleB2;/NleB2p-2H,; 
respectively, and ligated into EcoRI/BamHI digested pTrc99A. To generate GST 
tagged NleB, nleB1 was amplified from EPEC E2348/69 genomic DNA by PCR 
using the primer pairs NleBl;/NleBlpcexr and ligated into EcoRI/Sall digested 
pGEX-4T-1. 

TRADD-Flag was kindly provided by J. Tschopp. FADD-Flag and RIPK1-Flag 
were provided by A. Mansell. To generate FADD,pp-Flag, the coding region for 
amino acids 1-96 of FADD was amplified by PCR from FADD-Flag using the 
primer pairs FADD;/FADDappr and ligated into EcoRI/BamHI digested 
p3XFlag-Myc-CMV to generate an N-terminal 3xFlag fusion to FADD 06 
(FADDapp-Flag). To generate His-FADD, FADD was amplified from FADD- 
Flag using the primer pairs FADDyyjs¢/FADDyjsp and ligated into BamHI/Sall 
digested pET28a. 

To generate pGBT-NleB1, nleB1 was amplified from EPEC E2348/69 or EHEC 
O157:H7 EDL933 genomic DNA by PCR using primer pairs NleBlgpgprr/NleBla 
or NleBlgucpre/NleBlgyr (pGBT-NleB1 and pGBT-NleBlg}). The PCR pro- 
ducts were digested with SmaI/EcoRI and EcoRI/BamHI, respectively and ligated 
into pGBT9. The plasmids pG@AD-FADDp xp and pGAD-FADDpp were gener- 
ated from the template pGAD-FADD recovered from the cDNA library screen 
using primer pairs FADDgapr/FADDpgpr and FADDppr/FADDgapr: respect- 
ively. The FADDpgp and FADDpp PCR products were digested with EcoRI/ 
HindIII and EcoRI/Sall, respectively, and ligated into pGAD424. The pGAD- 
RIPKIpp plasmid was generated using primers RIPK1pp,/RIPKlc¢apr using 
pGAD-RIPK1 recovered from the cDNA library screen as template DNA, and 
cloned using the EcoRI and BamHI restriction sites. 

Construction of non-polar deletion mutants. C. rodentium strain ICC169 AnleB 
(ICC911) was constructed using the lambda Red-based mutagenesis system”. The 
nleB gene was replaced by a kanamycin resistance cassette following integration of 
a linear PCR product generated using the primers pair crNleB-pKD4-Fw/crNleB- 
pKD4-Rv, which contain 50 nucleotide homology extensions corresponding to 
regions in 5’ and 3’ of the nleB open reading frame and priming sequences for the 
kanamycin cassette of plasmid pKD4 (Supplementary Table 3)”*. The PCR pro- 
duct was DpnI-digested to remove residual template, gel purified and electropo- 
rated into ICC169 carrying the temperature-sensitive lambda Red helper plasmid 
pKD46 following induction of lambda Red recombinase with 10 mM L-arabinose 
at 30 °C. The deletion mutant of C. rodentium nleB was confirmed by PCR and 
DNA sequencing and the growth rates of the mutant and parental strains were 
assessed in rich and minimal media. The C. rodentium strains ICC169 AespI 
(ICC179) and ICC169 AespF (ICC177) have been described previously'’. EspI 
binds to Sec24 and inhibits COPII-dependent vesicle trafficking while EspF is a 
multifunctional effector that has been associated with destabilization of mitochon- 
dria, induction of apoptosis, activation of sorting nexin 9 and disruption of tight 


junctions’. These mutants were chosen as controls because they exhibit varying 
degrees of attenuation in the mouse model of C. rodentium infection and have 
functions unrelated to NleB. 

The EPEC E2348/69 nleB1 mutant was created with the one step PCR lambda 
Red method using primers NleB1 E69FRT};55 and NleB1 E69FRT R156. PKD3 was 
used as template DNA to amplify the chloramphenicol cassette. The cassette was 
electroporated into wild-type EPEC E2348/69 and positive clones were selected on 
LB agar with 51g ml‘ chloramphenicol. The mutations were checked by PCR 
using primer pairs from outside and inside the mutation. The EPEC E2348/69 nleB2 
mutant was created using primers NleB2 E69FRTp;5 and NleB2 E69FRTRij6. 
pKD4 was used as template DNA to amplify the kanamycin cassette. The cassette 
was electroporated into wild-type EPEC E2348/69 and positive clones selected for 
on LBagar with 50 jg ml’ kanamycin. The mutations were checked by PCR witha 
combination of primers from outside and inside the mutation. The EPEC E2348/69 
nleB1/nleB2 double mutant was generated using the one step PCR method by 
amplifying the nleB2 mutation from the nleB2 mutant with primers NleB22;2 
and NleB25,3. The PCR product was transferred into the nleB1 mutant strain by 
electroporation. Mutants were first selected on LB agar plates with 20 pg ml‘ of 
kanamycin and then grown with chloramphenicol (25 jig ml~') and kanamycin 
(50 ug ml!) to select for nleB1 and nleB2 allelic replacement. 

Site-directed mutagenesis. The His-FADDaij74 and His-FADDs;224 mutants, 
and the NleB1 mutants, pNleB1, 44, EGFP-NleBl, 4 and pGEX-NleBl, 44 were 
generated using the Stratagene QuikChange II Site-Directed Mutagenesis Kit. 
FADDagiiza and FADDs;24 were generated by PCR using the primer pairs 
FADD riizayr/FADD aiizayr and FADD (g1224)r/ FADD g1224)r, respectively with 
His-FADD as template DNA. For the NleB mutants, pNleB1, EGFP-NleB1 or 
pGEX-NleB1 were used as template DNA and amplified by PCR using the primer 
pair pNleB1(4qayr/PNleBl(aaayr. Plasmids were digested with DpnlI at 37°C 
overnight before subsequent transformation into the appropriate E. coli strain. 
Yeast two-hybrid screening and [-galactosidase assay. NleB1 from EPEC strain 
E2348/69 (GenBank accession CAS10779) and NleB1 from EHEC 0157:H7 strain 
EDL933 (GenBank accession NP_289553.1) were used as bait. The BD Matchmaker 
pre-transformed HeLa cDNA library (Clontech) was screened according to the 
manufacturer’s protocols (Clontech PT3183-1 manual) to identify HeLa proteins 
interacting with NleB. The yeast strain AH109 (MAT) was transformed with 
pGBT-NleB using the lithium acetate method and mated with Y187 (MAT) car- 
rying the cDNA library in pGADT7 Rec plasmid. The mating mixtures were plated 
onto quadruple drop-out plates (Trp , Leu, Ade , His ) to select for diploids 
expressing reporter genes. The pGADT7-Rec cDNA plasmids were selectively res- 
cued from those diploids with positive protein interactions into E. coli KC8. The 
pGADT7-Rec cDNA plasmids were then sequenced using primer Rec744 to identify 
the cDNA inserts. 

f-galactosidase assays were performed according to the manufacturer’s proto- 
cols (Clontech PT3024-1 manual). Briefly, pGADT7-Rec plasmid alone or with 
pGBT-NleB (or pGBT9, pGAD-FADD, pGAD-FADDpgp, pGAD-FADDpp 
when necessary) were transformed into Saccharomyces cerevisiae strain PJ69-4A 
using the lithium acetate method. Transformants were selected on Trp Leu~ 
plates and grown to an optical density (Déo0 nm) of 0.6 before lysis and assay for the 
level of B-galactosidase activity using ONPG as a substrate. Data are from at least 
three biological replicates performed in triplicate. 

Bacterial adherence assay. Standard lines of HeLa and HEK293T cells are main- 
tained in our laboratory and regularly tested for mycoplasma contamination. To 
assess the level of bacterial replication during infection and treatment with FasL, 
bacterial cultures grown in DMEM for were standardised by measurement of 
Deo0 nm and plated on selective media for assessment by viable count. HeLa cell 
monolayers were infected in duplicate with the same cultures for 3 h before being 
incubated for 1.5h in media supplemented with or without 20 ng ml’ Fey-FasL. 
Following treatment, cells were washed three times and attached bacteria were 
resuspended in DMEM by pipetting and subsequently spread on selective media 
for assessment of bacterial viable count. Data are from at least three biological 
replicates performed in triplicate. 

Immunoprecipitation and detection of HA-, EGFP-, Flag-tagged proteins. All 
immunoprecipitation experiments were performed as at least three biological repli- 
cates. For immunoprecipitation by GFP-Trap-M (Chromotek), HEK293T cells 
were grown in 10-cm tissue culture dishes (Greiner Bio One) and co-transfected 
with pEGFP-NleBl or pEGFP-NleB2 in combination with pRIPK1-Flag, 
pTRADD-Flag, pFADD-Flag or pFADDapp-Flag. Following transfection (after 
18-24h), lysis and immunoprecipitation were carried out according to instruc- 
tions for immunoprecipitation of GFP-fusion proteins provided by the GFP-Trap- 
M supplier. 

For immunoprecipitation of 2X haemagglutinin (HA) tagged proteins, HeLa 
cells cultured in T-75 cm? tissue culture flasks (Corning) were first transfected with 
FADD-Flag, RIPK1-Flag or TRADD-Flag expression constructs for 18 h followed 
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by infection with various EPEC derivatives for 90-120 min. Cells were then washed 
three times with cold PBS and lysed in 1 ml cold lysis buffer (1% Triton X-100, 
50 mM Tris-HCl, pH 7.4, 1mM EDTA, 150mM NaCl and Complete Protease 
Inhibitor (Roche)) and incubated on ice for 10 min before collection. Cell debris 
was pelleted and equal volumes of supernatant collected. 100 il of the supernatant 
was kept as input and 60 pil of monoclonal anti-HA-agarose (Clone HA-7, Sigma- 
Aldrich) was added to the remainder of the supernatant which was incubated on a 
rotating wheel at 4°C overnight. The agarose was washed three times with lysis 
buffer and resuspended in 50 pil of 2X SDS sample buffer. 

For immunoprecipitation of Fas receptor complexes (DISC) by protein G, HeLa 
cells were grown in 10-cm tissue culture dishes and infected with various EPEC 
derivatives for 2 h. Cells were then treated with 50 1g ml * gentamycin and 1 pig of 
Fcy-FasL for 1h. An uninfected control was left untreated. Cells were lysed in 
400 pl of 2X DISC lysis buffer (20mM Tris-HCl, pH7.4, 150 mM NaCl, 2mM 
EDTA, 1% TritonX-100, 10% glycerol and Complete Protease Inhibitor (Roche)) 
and incubated on ice for 10 min before centrifugation to remove cell debris. An 
aliquot (80 pl) of the supernatant was kept as input and 40 ul of Protein G 
Dynabeads (Invitrogen) was added to the remainder of the sample which was 
placed on a rotating wheel at 4 °C overnight. Beads were washed three times with 
lysis buffer and finally resuspended in 75 pl of DISC lysis buffer with 25 il of 5X 
SDS sample buffer. 

All samples were boiled for 5 min, subjected to SDS-PAGE and transferred to 

nitrocellulose membranes. Membranes were probed with the following primary 
antibodies as necessary: mouse monoclonal anti-HA.11 (16B12) (Covance), mouse 
monoclonal anti-Flag M2-HRP (Sigma-Aldrich), mouse monoclonal anti-FADD 
(BD Transduction Laboratories), rabbit monoclonal anti-Fas (C18C12) (Cell 
Signaling), rabbit polyclonal anti-caspase-8 (D35G2) (Cell Signaling), mouse 
monoclonal anti-GFP (7.1 and 13.1) (Roche) or mouse monoclonal anti-B-actin 
(AC-15) diluted 1:1,000 in TBS with 5% BSA and 0.1% Tween. Images were visua- 
lized using an MFChemiBis imaging station (DNR). 
Preparation of GST and His tagged proteins, demonstration of N-acetyl- 
glucosylation of recombinant FADD by NleB1 and detection by immunoblot- 
ting. Overnight cultures of BL21 (pGEX-4T-1), BL21 (pGEX-NleB1), BL21 
(pGEX-NleBl,aq) and BL21 (pET-FADD), BL21 (pET-FADDgy;7q), BL21 
(pET-FADDs;224) grown in LB were diluted 1:100 in 200 ml of LB supplemented 
with either kanamycin (pET) or ampicillin (pGEX) (100 mg ml~ 1) with shaking to 
an optical density of (D600 nm) 0.6 at 37 °C. Cells were incubated with 1 mM IPTG 
and grown for a further 2 h then pelleted by centrifugation. Proteins were purified 
by either nickel or glutathione affinity chromatography in accordance with the 
manufacturer’s instructions (Novagen). Protein concentrations were determined 
using a bicinchoninic acid (BCA) kit (Thermo Scientific). 

For detection of N-acetylglucosylation of FADD by NleB1, 2 1g of purified 
recombinant proteins were incubated either alone or in combination at 37°C 
for 4h in the presence of 1 mM UDP-GlcNAc. 5X SDS sample buffer was added 
to the samples before boiling for 5 min. Samples were subjected to SDS-PAGE and 
transferred to nitrocellulose membranes which were subsequently probed with the 
following primary antibodies: mouse monoclonal anti-GlcNAc (CTD110.6) (Cell 
Signaling), which recognizes O-linked and N-linked GlcNAc”, rabbit polyclonal 
anti-GST (Cell Signaling), or mouse monoclonal anti-His (AD1.1.10) (AbD Serotech) 
diluted 1:1,000 in TBS with 5% BSA and 0.1% Tween. Detection and visualization were 
performed as previously mentioned. Detection of N-acetylglucosylated recombinant 
FADD was performed as three technical replicates. 

For detection of N-acetylglucosylation during transient transfection with NleB1 
and FADD, HEK293T cells were grown in 10-cm tissue culture dishes (Greiner Bio 
One) and co-transfected with pEGFP-NleB1 or pEGFP-NleB1, 44 in combina- 
tion with pFADD-Flag. Cells were washed three times with cold PBS and lysed in 
600 pl of cold lysis buffer (1% Triton X-100, 50mM Tris-HCl, pH7.4, 1mM 
EDTA, 150mM NaCl and Complete Protease Inhibitor (Roche)) and incubated 
on ice for 10 min before collection. Cell debris was pelleted and equal volumes of 
supernatant collected. Then 80 ul of the supernatant was kept as input and 40 ul of 
monoclonal anti-Flag M2 magnetic beads (Sigma-Aldrich) was added to the 
remainder of the supernatant which was incubated on a rotating wheel at 4°C 
overnight. The beads were washed three times with cold lysis buffer followed by 
elution of bound proteins with 80 j1g of Flag peptide (Sigma-Aldrich) to which 16 pl 
of 5X SDS sample buffer was added. Samples were boiled for 5 min, subjected to 
SDS-PAGE and transferred to nitrocellulose membranes. Membranes were probed 
with the following primary antibodies as necessary: mouse monoclonal anti-O/N- 
GlcNAc (CTD110.6) (Cell Signaling), mouse monoclonal anti-Flag M2-HRP 
(Sigma-Aldrich), mouse monoclonal anti-GFP (7.1 and 13.1) (Roche) or mouse 
monoclonal anti-B-actin (AC-15) diluted 1:1,000 in TBS with 5% BSA and 0.1% 
Tween. Images were visualized using an MFChemiBis imaging station (DNR). 
Glyco-site specific analysis of FADD. Intact protein analysis was performed by 
the Proteomics Laboratory at the Walter and Eliza Hall Institute (Fig. 2b and 
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Extended Data Fig. 4 ). 10 ug of intact recombinant His-FADD, His-FADDaii74 
or His-FADDg122,4 incubated with GST-NleB1 and UDP-GIcNAc, was injected 
and separated by nano-flow reversed-phase liquid chromatography on a nano LC 
system (1200 series, Agilent) using a custom packed C4 capillary column (15 cm 
X 0.15 mm inner diameter (I.D.), packed with Resprosil 300 A C4 5- jum resin, Dr 
Maisch GmbH) using a linear 45 min gradient from 5 to 100% buffer B ata flow rate 
of 1.2 plmin™! (A: 0.1% Formic acid in Milli-Q water B: 0.1% Formic acid, 80% 
acetonitrile, (Mallinckrodt Baker) 20% Milli-Q water). The nano HPLC was coupled 
online to a Q-Exactive mass spectrometer equipped with a nano-electrospray ion 
source (Thermo Fisher Scientific) set to acquire full scan and all-ion-fragmentation 
(AIF) sequentially. Mass accuracy of the intact protein was achieved by summing 
multiple lower resolution (35,000) full scan MS events. The intact mass and error 
calculation was facilitated by the open source program mMass. 

Tryptic peptide analysis was performed by the Proteomics Laboratory at the 
Walter and Eliza Hall Institute (Fig. 2c and Extended Data Fig. 2). Reacted and 
unreacted mixtures of recombinant FADD and NleB1 were purified by SDS- 
PAGE and bands corresponding to FADD were excised and manual in-gel reduc- 
tion, alkylation and tryptic digestion was performed. Extracted peptides were 
injected and separated by nano flow reversed-phase liquid chromatography on 
a nano LC system (1200 series, Agilent) using a nanoAcquity C18 150 mm X 0.15 
mm I.D. column (Waters) with a linear 45-min gradient from 5 to 100% buffer B 
set at a flow rate of 1.2 lmin ’ (A: 0.1% Formic acid in Milli-Q water, B: 0.1% 
Formic acid, 80% acetonitrile, (Mallinckrodt Baker, New Jersey, USA) 20% Milli-Q 
water). The nano HPLC was coupled online to an LTQ-Orbitrap XL mass spectro- 
meter equipped with a nano-electrospray ion source (Thermo Fisher Scientific) for 
automated MS/MS using multi-stage activation (MSA) setting. The Orbitrap was 
run in a data-dependent acquisition mode with the Orbitrap resolution set at 
30,000 and the top-three multiply charged species selected for fragmentation in 
the linear ion trap by collision-induced dissociation (CID) (single charged species 
were ignored). Fragment ions were analysed on the Orbitrap with the resolution set 
at 7,500 with the ion threshold set to 15,000 counts. The activation time was set to 
30 ms, normalized collision energy set to 35. Raw files consisting of full-scan MS 
and high resolution MS/MS spectra were converted to the MGF data format with 
Proteome Discoverer 1.4 and searched against UniProt database (2013/02) limi- 
ting the search to mouse taxonomy and custom FASTA database containing the 
protein constructs of FADD and NleB using Mascot v2.4. Mascot parameters for 
each search included Trypsin/no P enzyme with six missed cleavages, a fixed 
modification in the form of carbamidomethyl at cysteine residues and variable 
modification of HexNAc at Ser/Thr/Asn/Arg residues, acetyl at protein 
N-terminal, oxidation at methionine residues. Spectra were searched with a mass 
tolerance of 20 p.p.m. in MS mode and 40 m.m.u. in MS/MS mode. The MS/MS 
fragmentation of modified peptides obtained from the Mascot search was manu- 
ally sequenced using mMass to assign ion peaks and confirm the site of modifica- 
tion. Detection of N-acetylglucosylated recombinant FADD by mass spectrometry 
was performed as two technical replicates. 

The site of modification was verified independently using electron transfer 
dissociation (ETD) fragmentation on an Orbitrap Elite mass spectrometer by 
the Bio21 Mass Spectrometry and Proteomics Facility at the University of 
Melbourne (Extended Data Fig. 3). The protein complex was digested with 
Endoproteinase AspN (Roche) at an enzyme to protein ratio of 1:5 at 37 °C over- 
night. The reaction contained FADD, NleB1 and UDP-GlcNAc and the incubation 
was performed as described earlier for anti-GlcNAc immunoblot analysis. LC-MS/ 
MS was carried out on a LTQ Orbitrap Elite (Thermo Scientific) with an EASY 
nano electrospray interface coupled to an Ultimate 3000 RSLC nanosystem 
(Dionex). The nanoLC system was equipped with a Acclaim Pepmap nano-trap 
column (Dionex - C18, 100 A, 75 um X 2cm) anda Thermo EASY-Spray column 
(Pepmap RSLC C18, 2m, 100 A, 75 um X 25cm). 4 ul of the peptide mix was 
loaded onto the enrichment (trap) column at an isocratic flow of 4 1 min of 3% 
CH;CN containing 0.1% formic acid for 5 min before the enrichment column was 
switched in-line with the analytical column. The eluents used for the liquid chro- 
matography were 0.1% (v/v) formic acid (solvent A) and 100% CH3CN/0.1% 
formic acid (v/v). 0.1% formic acid (v/v). The flow following gradient was used: 
3% B to 12% B for 1 min, 12% B to 35% B in 20 min, 35% B to 80% B in 2 min and 
maintained at 80% B for 2 min and equilibration at 3% B for 7 min before the next 
sample injection. The LTQ Orbitrap Elite mass spectrometer was operated in the 
data dependent mode with nano ESI spray voltage of +2.0 kV, capillary temper- 
ature of 250 °C and S-lens RF value of 60%. A sequential mode whereby spectra 
were acquired first in positive mode with full scan scanning from m/z 300-1650 in 
the FT mode at 120,000 resolution followed by CID, ETD and high energy collision 
dissociation (HCD). CID in the linear ion trap was carried out in parallel with three 
most intense peptide ions with charge states = 2 isolated and fragmented using 
normalized collision energy of 35 and activation Q of 0.25. The decision tree 
procedure for ETD activation was also applied whereby peptides with charge state 
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3 of <650 m/z, charge state 4 of <900 m/z and charge state 5 of <1,000 m/z were 
subjected to 100 ms of ETD activation. HCD at 15,000 resolution using normalized 
collision energy of 35 and 0.1 ms activation time was carried out on every precursor 
selected. The mass spectra were searched against the Uniprot database using the 
Sequest HT (V1.3) search algorithm as part of the Proteome Discoverer 1.4 
Workflow (Thermo Scientific). Searching parameters used were: variable modifi- 
cations (HexNAc of N, S, T and R; 203.079), 2 missed tryptic cleavages, 10 p.p.m. 
peptide mass tolerance and 0.6 Da fragment ion mass tolerance. The false-discovery 
rate (derived from corresponding decoy database search) was less than 1%. Peptides 
of interest were manually inspected and validated. 

IL-8 secretion assay. For analysis of IL-8 secretion, HeLa cell monolayers were 
infected for 3 h before being incubated for 8-12 h in media supplemented with 50 ig 
ml gentamycin with or without 20 ng ml * TNF (Calbiochem, EMD4Biosciences). 
Following this, the HeLa cell supernatant was collected and either used immediately 
or stored at —20°C for subsequent analysis of IL-8 secretion. IL-8 secretion was 
measured using the Quantikine Human IL-8 Immunoassay (R&D Systems) accord- 
ing to the manufacturer’s instructions. Samples were measured on a FLUOstar 
Omega microplate reader (BMG Labtech). Differences in IL-8 secretion were assessed 
for significance by one-way analysis of variance (ANOVA) with Dunnett’s multiple 
comparison post-test. Data are from at least three biological replicates performed in 
triplicate. 

Detection of cleaved caspase-8 by immunoblotting. For detection of cleaved 
caspase-8 during EPEC infection, HeLa cells were grown in 24-well tissue culture 
plates (Greiner Bio One) and infected for 3 h with various EPEC derivatives. Cells 
were then treated with 50 jg ml’ gentamicin, and either 20 ng ml! of Fcy-FasL, 
20 ng ml! TNF or a combination of 20ngml' TNF and 10 pg ml cyclohexi- 
mide for a further 60 min and lysed in 60 ul of 2X SDS sample buffer. For detection 
of cleaved caspase-8 by transfection, HeLa cells were grown in 24-well tissue culture 
plates and either left untransfected or transfected with pEGFP-C2, pEGFP-NleB1 
or pEGFP-NleB1, q for 18-24 h using FuGENE 6 (Roche). Cells were then treated 
with 20 ng ml”! of Fcy-FasL for a further 60 min and collected by lysis in 60 jl of 
2X SDS sample buffer. All samples were boiled for 5 min and subjected to SDS- 
PAGE followed by transfer to nitrocellulose membranes (PALL). Membranes were 
incubated at least overnight at 4°C with rabbit polyclonal anti-cleaved caspase-8 
(Asp391) (18C8) (Cell Signaling) diluted 1:1,000 in TBS with 5% skimmed milk and 
0.1% Tween (Sigma-Aldrich), rabbit polyclonal anti-caspase-8 (D35G2) (Cell 
Signaling) diluted 1:1,000 or mouse monoclonal anti-f-actin (AC-15) (Sigma- 
Aldrich) diluted 1:5,000 in TBS with 5% BSA (Sigma-Aldrich) and 0.1% Tween. 
Proteins were detected using anti-rabbit or anti-mouse IgG secondary antibodies 
conjugated to horseradish peroxidase (PerkinElmer) and developed with either 
enhanced chemiluminescence (ECL) western blotting reagent (Amersham) or 
Western Lightning Ultra reagent (PerkinElmer). All secondary antibodies were 
diluted 1:3,000 in TBS with 5% BSA and 0.1% Tween. Images were visualized using 
an MFChemiBis imaging station (DNR). At least three biological replicates were 
performed for all experiments. 

Detection of cleaved caspase-8 and caspase-3 in vitro by immunofluorescence 
microscopy. For visualization of caspase-8 cleavage by immunofluorescence 
microscopy, HeLa cells were either infected with EPEC E2348/69 derivatives for 
3h or transfected for 16h followed by an additional 1h of treatment with 20 ng 
ml! Fcy-FasL. One experiment used recombinant rhFasL to treat transfected 
HeLa cells for comparison with tunicamycin treatment (Extended Data Fig. 1d, e). 
For this, HeLa cells were transfected for 16h and treated with 200 ng ml — 1 of 
rhFasL (R&D Systems). For visualization of caspase-3 cleavage induced by tuni- 
camycin, HeLa cells were transfected for 16h, followed by 18h treatment with 
5 wg ml tunicamycin (Sigma-Aldrich). Cells were then fixed in 3.7% (w/v) form- 
aldehyde (Sigma-Aldrich) in PBS for 10 min and permeabilized with 0.2% Triton 
(Sigma-Aldrich) for 4 min. Following 30 min blocking in PBS with 3% (w/v) BSA, 
samples were exposed to primary antibodies (specific to cleaved caspase-8 or 
cleaved caspase-3; Cell Signaling Technology) diluted in blocking solution for 
1h at 20°C, except for cleaved caspase-3, which was incubated overnight at 
4°C. For fluorescent actin staining in EPEC infected cells, 0.5 pg ml” * of phalloi- 
din conjugated to rhodamine (Sigma-Aldrich) was added during primary anti- 
body incubation. 4’,6-diamidino-2-phenylindole (DAPI, Invitrogen) was applied 
at 0.541gml ' in blocking solution for 5 min at 20°C post-secondary antibody 
treatment. Secondary antibodies were all coupled to Alexa Fluor dyes (Invitrogen) 
applied to cells at 1:2,000 in blocking solution for 1h at 20°C. Coverslips were 
mounted onto microscope slides with Prolong Gold anti-fade reagent (Invitrogen). 
Images were acquired using a Zeiss confocal laser scanning microscope with a 
100X/EC Epiplan-Apochromat oil immersion objective. Cleavage of caspase-8 or 
caspase-3 was quantified from at least three biological replicates performed in 
duplicate for both transfection and infection studies counting at least 100 cells 
per field. 


Cell viability assays (MTT and propidium iodide staining). HeLa cells were 
transfected in 24-well tissue culture plates for 18-24 h before being left untreated 
or treated with 20ngml~' of Fcy-FasL for a further 60min. The cells were 
washed once with PBS and replaced with DMEM containing 0.1 1g ml 3-(4,5- 
dimethylthiazol-2-yl)-2,5- diphenyltetrazolium bromide (MTT) (Sigma) for 1h, 
after which the medium was removed and 100 ul of dimethylsulphoxide (DMSO, 
Sigma) was added to each well. After thorough mixing on an orbital shaker for 
1 min, the absorbance at 540 nm for each well was obtained using a FLUOstar 
Omega microplate reader (BMG Labtech). Results were obtained from at least 
three biological replicates performed in triplicate. 

For analysis of cell viability by PI staining, HeLa cell monolayers were infected 
for 3h before being incubated for 1h in media supplemented with 50 pg ml! 
gentamycin with 20 ng ml‘ Fcy-FasL. Propidium iodide (50 1g ml") was added 
for the final 15 min of treatment. Cells were fixed and stained for confocal micro- 
scopy as previously mentioned. Phalloidin conjugated to FITC (Sigma-Aldrich) 
was used at 0.5 jig ml’ to stain for actin. Duplicate coverslips were counted for PI 
positive cells, and results were obtained from at least three biological replicates 
counting at least 100 cells per field. 

Animal experiments. All animal experimentation was approved by the University 
of Melbourne Animal Ethics Committee, applications 0808209 and 1112062. All 
genetically modified and spontaneous mutant mice are on a C57BL/6 background. 
Overnight cultures of bacterial strains were pelleted by centrifugation and resus- 
pended in PBS. C57BL/6 (wild-type), Fas!?”?", Fas8’84 and Bid /~ mutant mice 
(6 to 8 weeks old, male and female mice) were inoculated by oral gavage with 200 kl 
containing approximately 1 X 10” c.f.u. of C. rodentium, or a lower dose of 1 X 10° 
c.f.u. when testing for prolonged infection (up to 33 days). The viable count of the 
inoculum was determined retrospectively by plating dilutions of the inoculum on 
plates with appropriate antibiotics. Animals were allocated to experimental groups 
to ensure even distribution of age, sex and weight. Investigators were not blinded 
to group allocation. Sample size was at least five animals per group for statistical 
power per biological replicate. Larger sample sizes (up to 12 mice per group) were 
used if mice were available. Infection experiments were repeated at least twice 
regardless of sample size to ensure reproducibility and results represent all experi- 
mental data combined from repeated infections. Mice were weighed every 2 days 
and faeces collected every 2 or 4 days for enumeration of c.f.u. The viable bacterial 
count per 100 mg of faeces was determined by plating serial dilutions in duplicate 
of faeces onto medium containing selective antibiotics. 

Histological analysis and scoring of tissue pathology and diarrhoea. The 
increased intestinal pathology observed in C. rodentium-infected FAS-, FasL- 
and Bid-deficient mice revealed a function for the FAS apoptotic pathway in 
limiting the extent of colitis during gastrointestinal infection. This exacerbated 
disease compared to C. rodentium infection of wild-type mice likely arose from the 
difference between a complete block of FAS signalling in the gut epithelium of 
FAS- deficient mice as opposed to less complete inhibition of FAS signalling in the 
gut epithelium by NleB in wild-type mice, which depends on the timing and 
efficiency of C. rodentium attachment and NleB translocation. A grading system 
for diarrhoea was developed in which a score of diarrhoea severity from 0-4 was 
recorded every 4 days. The scoring system can be interpreted as follows, (0) (no 
diarrhoea): solid stool with no sign of soiling around the anus. The stool is very 
firm when subjected to pressure with tweezers; (1) (very mild diarrhoea): formed 
stools that appear moist on the outside, and no sign of soiling around the anus. 
Stool is less firm when considerable pressure applied with tweezers; (2) (mild 
diarrhoea): formed stools that appear moist on the outside, and some signs of 
soiling around anus. Stools will easily submit to pressure applied with tweezers; 
(3) (diarrhoea): no formed stools with a mucous-like appearance. Considerable 
soiling around the anus and the fur around tail. Mouse takes a long time to pass 
stool; (4) (severe, watery diarrhoea): mostly clear or mucous-like liquid stool with 
very minimal solid present and considerable soiling around anus. Mouse may not 
be able to pass stool at all and may have a hunched appearance. Occasionally, 
blood may be observed in the stool. 

For histological analysis, mice inoculated with the higher dose of C. rodentium 
(1 X 10° c.f.u.) were culled at day 10 or 12 post-infection and the distal portion of 
colon from the caecum to the rectum removed aseptically and weighed after 
removal of faecal pellets. Colons and spleens were homogenized mechanically 
in PBS using a Seward 80 stomacher, and the bacterial c.f.u. per 100 mg of organ 
homogenate was enumerated by plating onto LB containing the appropriate anti- 
biotics. Colons from a representative group of these mice were collected and fixed 
in 4% (w/v) paraformaldehyde (Sigma-Aldrich) and sectioned for haematoxylin 
and eosin staining and assessment of gut pathology. A scoring system (0-3) was 
used bya veterinary pathologist to assess the extent of tissue damage: (0) no damage; 
(1) discrete lesion; (2) mucosal erosion; and (3) extensive mucosal damage/ulceration 
(extending into muscularis and deeper). 
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Detection of C. rodentium and cleaved caspase-8 in mouse colon tissue by 
immunofluorescence microscopy. For immunofluorescence staining of mouse 
colons, the distal portion of colon from the caecum to the rectum was removed 
from representative groups of infected and uninfected C57BL/6 (wild-type) and 
Fas'?"”"?" mice, gently flushed with a 1:1 mix of OCT compound (Tissue-Tek) and 
PBS and frozen at —80°C in full strength OCT compound. 10-,1m sections were 
cut on a Leica CM3050S cryostat and mounted onto Menzel-Glaser Superfrost 
Plus slides (Thermo Fisher Scientific). Slide sections were dried for at least 2 h 
before fixing in absolute acetone at —20°C, followed by a 30 min block with Dako 
Protein-Free Block Serum (Dako). Sections were incubated with primary antibody; 
polyclonal rabbit anti-C. rodentium O152 (Statens Serum Institute) diluted 1:100 
or mouse specific monoclonal rabbit anti-cleaved caspase-8 (Asp387) (D5B2) XP 
diluted 1:500 in 2.5% normal donkey serum (NDS) (Jackson Immunoresearch 
Laboratories) in PBS. Secondary antibody coupled to Alexa Fluor-488 was applied 
at 1:1,000 in 2.5% NDS followed by 2min of exposure to Hoechst 33258 
(Invitrogen) diluted at 1:2,000 in 2.5% NDS. Slides were mounted with glass cover- 
slips using Prolong Gold anti-fade reagent (Invitrogen) and images were acquired 
using a Zeiss LSM700 inverted Axio Observer with LED laser lines at 405 nm and 
488 nm, and a Plan-Apochromat 20X/0.8 objective. To determine the average 
depth of penetration by C. rodentium in mouse intestinal crypts, ImageJ was used 
to measure the distance of bacterial staining from the epithelial surface on immu- 
nofluorescent images. Results are from five independent sections with at least three 


LETTER 


measurements taken from each section. To quantify the amount of cleaved caspase- 
8 in infected mouse colons, the relative fluorescence intensity of cleaved caspase-8 
staining was expressed as a percentage of the relative fluorescence intensity of 
Hoechst staining from 5 independent infected fields of 3 mice for each bacterial 
strain. 

Hyperplasia of colonic crypts (crypt height) was measured using MIRAX soft- 

ware on haematoxylin and eosin stained sections at day 10 and 12 post-infection. 
Data are from 4 sections of distal colon measured at least 50 j1m apart per animal 
from at least three individual mice per group. 
Statistical analysis. All statistical analyses were performed using GraphPad Prism 
version 6.0. Statistical tests used were unpaired two-tailed Student’s t-test for 
pairwise comparisons between groups or one-way ANOVA with Dunnett’s mul- 
tiple comparison test for multiple comparisons as indicated. Variance was similar 
in all comparisons. Differences in faecal counts of C. rodentium from mice and 
diarrhoea scores were assessed using a Mann-Whitney U-test, in which normal 
distribution was not assumed. P < 0.05 was considered to be significant. 


23. Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in 
Escherichia coli K-12 using PCR products. Proc. Nat! Acad. Sci. USA 97, 
6640-6645 (2000). 
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Extended Data Figure 1 | NleB1 binds FADD, TRADD and RIPK1 and 
blocks extrinsic apoptosis. a, Co-immunoprecipitation of bacterially 
delivered NleB1-2HA with FADD-Flag, TRADD-Flag and RIPK1-Flag in 
HeLa cells. Lane 1, HeLa cells infected with EPEC AnleB1 (pNleB1) (negative 
control). Lane 2, HeLa cells infected with EPEC AnleB1 (pNleB1-2HA). Actin, 
loading control. Representative immunoblot from at least three independent 
experiments. b, GFP-Trap pull-down of FADD-Flag or FADD app-Flag. 
NleB1 from EPEC E2348/69, EHEC 0157:H7 EDL933 and C. rodentium 
ICC169 fused with EGFP were used as indicated. Actin, loading control. 
Representative immunoblot from at least three independent experiments. 

c, HeLa cells were infected with derivatives of EPEC as indicated for 4h and left 
unstimulated (dark grey bars) or stimulated with TNF for 8 h (light grey bars). 
Results are the mean = s.e.m. of at least three independent experiments carried 
out in duplicate. The * indicates significantly greater than uninfected, 
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unstimulated cells, P< 0.0001, one-way ANOVA with Dunnett’s multiple 
comparison test. d, Apoptosis was induced by FasL and HeLa cells were stained 
with antibodies to cleaved caspase-8. Arrows indicate cells with cleaved 
caspase-8. Scale bar, 5 jm. Representative images from at least three 
independent experiments. e, Quantification of cleaved caspase-8 or cleaved 
caspase-3 by immunofluorescence microscopy in HeLa cells expressing EGFP 
or EGFP-NleB1. UT, untransfected; Results are the mean = s.e.m. of 
percentage of cells with cleaved caspase-8 or cleaved caspase-3 from at least 
three independent experiments counting ~200 cells in triplicate. *P < 0.0001, 
unpaired, two-tailed t-test. f, N-acetylglucosylation of FADD-Flag detected 
using a monoclonal antibody to GlcNAc of immunoprecipitated lysate from 
HEK293T cells. Actin, loading control. Representative immunoblot from at 
least three independent experiments. 
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Extended Data Figure 2 | Mass spectra of peptides identifying FADD Arg 117 GlcNAc modification. Fragment ions of overlapping tryptic missed cleaved 


peptides confirm Arg 117 of FADD as the modification site. 
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Extended Data Figure 3 | Mass spectra obtained using electron transfer 
dissociation fragmentation on the Orbitrap Elite mass spectrometer. a, Red 
and blue spectra represent the c and z series ions that matches the peptide 
sequence DWKRLAR-HexNAc-ELKVSEAKM (addition of 203.0793 Da) 
produced from an AspN endoproteinase digest of FADD incubated with 
NleB1. Insert box shows the precursor ion present as a 4+ charge isotope series 
with a delta mass of 0.15 p.p.m. between the measured and theoretical mass of 


the modified peptide. Xcorr score of 6.26 was obtained using the Sequest HT 
(V1.3) search engine. b, Total ion current of the AspN digest. c, Extracted ion 
chromatogram of mass 541.5449 belonging to the HexNAc modified peptide 
DWKRLARELKVSEAKM using isolation mass tolerance of 10 p.p.m. d, ETD 
fragmentation spectra of the 541.55 precursor ion. Insert table shows the 
theoretical masses associated with the respective fragmentation ions. 
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Extended Data Figure 4| GlcNAc modification of FADD by GST-NleB1. _ substitution incubated with GST-NleB1 and UDP-GlcNAc reveals a single 


a, Intact protein mass spectrometry of FADD with an R117A substitution GlcNAc modification. ¢, In vitro assay for NleB1 GlcNAc modification of 
incubated with GST-NleB1 and UDP-GIcNAc reveals no GlcNAc mutated forms of FADD as indicated. Representative immunoblot from at least 
modification. b, Intact protein mass spectrometry of FADD with S122A three independent experiments. 
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Extended Data Figure 5 | Comparison of FasL and TNF-mediated caspase-8 | TNF for 1h, as indicated. Representative immunoblot from at least three 
processing and analysis of NleB2. a, GFP-Trap pull-down of FADD-Flag, independent experiments. f, Unprocessed and cleaved caspase-8 in HeLa cells 


TRADD-Flag or RIPK1-Flag with EGFP-NleB2 in HEK293T cells. Actin, infected for 45 min with derivatives of EPEC and left untreated or treated with 
loading control. Representative immunoblot from at least three independent = TNF and cyclohexamide (CHX) for 1h as indicated. Actin, loading control. 
experiments. b-d, Co-immunoprecipitation of bacterially delivered NleB2- Representative immunoblot from at least three independent experiments. 


2HA with FADD-Flag, TRADD-Flag and RIPK1-Flag in HeLa cells using HA —__g, Adherence of EPEC to HeLa cells. HeLa cells were infected with derivatives of 
antibodies. Lane 1, HeLa cells infected with EPEC AnleB2 (pNleB2) (negative © EPEC for 3h followed by treatment with FasL for 1h. Number of recovered 
control). Lane 2, HeLa cells infected with EPEC AnleB2 (pNleB2-2HA). Actin, _ bacteria and the inoculum are shown for comparison. Mean + s.e.m. are 
loading control. Representative immunoblots from at least three independent _ indicated. Data are from three independent experiments performed in 
experiments. e, Unprocessed and cleaved caspase-8 in HeLa cells infected for _ triplicate. 

3h with derivatives of EPEC and left untreated or treated with Fcy—FasL or 
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Extended Data Figure 6 | Infection of C57BL/6 and Fas'?"”P" mice with 100 um apart (transverse or longitudinal), per animal from five individual mice 
: ae : . . ; Ipr/Ipr ; 
C. rodentium. a, Immunofluorescence staining of cleaved caspase-8 in per group. b, Time course of infection of C57BL/6 and Fas’””” mutant mice 
different colonic sections from C57BL/6 mice infected with wild type with C. rodentium. Each data point represents log) c.f-u. per 100 mg faeces per 


C. rodentium and an nleB mutant as indicated, using antibodies to C. rodentium —_ individual animal on the indicated days. Mean + s.e.m. are indicated, dotted 
O-antigen (anti-O152, green) and cleaved caspase-8 (red). Intestinal tissue was _ line represents detection limit. C. rodentium, mice infected with wild type 


visualized with Hoechst staining for DNA (blue). Arrows indicate cleaved C. rodentium; AnleB mice infected with C. rodentium nleB mutant. *P < 0.05, 
caspase-8 positive cells sloughed into the gut lumen. Scale bar, 100 jim. **P < 0.01, ***P < 0.001. P values from Mann-Whitney U-test. Dotted line 
Representative images from at least three separate sections of colon at least represents the detection limit. 
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Extended Data Figure 7 | Histological analysis of intestinal sections at day 
10 from FAS-deficient mice infected with C. rodentium. a, Bacterial load in 
the colon of mice infected with C. rodentium (CR). Each data point represents 
logo c.£u. per 100 mg colon per individual animal on day 10 post-infection. 
C57BL/6 CR, wild-type mice infected with wild-type C. rodentium; C57BL/6 
AnleB, wild-type mice infected with C. rodentium nleB mutant; Fas?" CR, 
Fas?” mice infected with wild-type C. rodentium, dotted line is the detection 
limit. P = 0.0002, Mann-Whitney U-test. Mean + s.e.m. are indicated. 

b, Resected colon weights (between rectum and caecum) of individual animals 
on day 10 post-infection. C57BL/6 CR, wild-type mice infected with wild-type 
C. rodentium; C57BL/6 AnleB, wild-type mice infected with C. rodentium nleB 
mutant; Fas!"?" CR, Fas" mice infected with wild type C. rodentium. 

P= 0.0164, Mann-Whitney U-test. Mean + s.e.m. are indicated. 
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c, Mean + s.e.m. crypt height in um in haematoxylin and eosin stained sections 
from C57BL/6 and Fas!?”?" mice infected with wild type C. rodentium. UI, 
uninfected. Data are from 4 sections of colon measured at least 50 lum apart per 
animal from at least 3 individual mice per group. d, Mean ~ s.e.m. tissue 
damage score in colon sections for individual mice infected with wild type 

C. rodentium. Scoring system is described in the Methods. UI, uninfected. 
P<0.05, Mann-Whitney U-test. e, Haematoxylin and eosin stained sections of 
colon from C57BL/6 (wild-type) or Fas” mice uninfected or infected with 
wild type C. rodentium (day 10 post-infection). UI, uninfected. Scale bar, 

100 jm. Mucosal inflammation (asterisk), neutrophil invasion of the 
muscularis mucosa (arrow), erosion of the epithelium (arrowhead). 
Representative images from two sections of colon at least 100 um apart 
(transverse or longitudinal) from at least three individual mice per group. 
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Extended Data Figure 8 | Diarrhoea score and histological analysis of P values from one-way ANOVA. c, Mean = s.e.m. crypt height in jim in 
intestinal sections at day 12 from C57BL/6 (wild-type), Fas!?"?", Fas*”4 ~~ haematoxylin and eosin stained sections from different mouse strains infected 
and Bid~’~ mice infected with C. rodentium. a, Bacterial load in the faeces of —_ with wild-type C. rodentium. UI, uninfected. Data are from 4 sections of colon 
different mouse strains infected with wild-type C. rodentium as indicated. Each measured at least 50 jum apart per animal from at least three individual mice per 
data point represents logy9 c.fu. per 100 mg colon per individual animal. group. d, Haematoxylin and eosin stained sections of colon from different 
Dotted line is detection limit. Mean + s.e.m. are indicated. b, Diarrhoea scoreat mouse strains infected with wild type C. rodentium. Scale bar, 100 um. 

day 12 post-infection of different strains of mice infected with C. rodentium. Representative images from two sections of colon at least 100 um apart 
Scoring system is described in the Methods. Mean = s.e.m. are indicated. (transverse or longitudinal) from at least three individual mice per group. 
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Extended Data Figure 9 | Bacterial load in the faeces of Fas'?*”P" mice b, Diarrhoea score at day 4, 8, 10 and 12 post-infection. Scoring system is 
infected with wild-type C. rodentium, an espI mutant and an espF mutant. _ described in the Methods. Mean = s.e.m. are indicated. P values from one-way 
a, Bacterial load in the faeces of mice during infection with derivatives of ANOVA. ¢, d, Bacterial load in the faeces of mice infected with derivatives of 
C. rodentium. Each data point represents log) c.f.u. per 100 mg faeces or colon __C. rodentium. Each data point represents log)9 c.f-u. per 100 mg faeces per 
as indicated per individual animal on days 4, 8, 10 and 12 post-infection. individual animal on day 8 post-infection. Mean ~ s.e.m. are indicated, dotted 
Mean = s.e.m. are indicated, dotted line represents detection limit. line represents detection limit. P values from Mann-Whitney U-test. 
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Stability and function of regulatory T cells is 
maintained by a neuropilin-1-semaphorin-4a axis 


Greg M. Delgoffe'*, Seng-Ryong Woo'*, Meghan E. Turnis', David M. Gravano’, Cliff Guy’, Abigail E. Overacre?, 
Matthew L. Bettini’, Peter Vogel’, David Finkelstein’, Jody Bonnevier®, Creg J. Workman! & Dario A. A. Vignali! 


Regulatory T cells (T,¢g cells) have a crucial role in the immune sys- 
tem by preventing autoimmunity, limiting immunopathology, and 
maintaining immune homeostasis’. However, they also represent a 
major barrier to effective anti-tumour immunity and sterilizing 
immunity to chronic viral infections’. The transcription factor 
Foxp3 has a major role in the development and programming of 
Treg cells”*. The relative stability of T,.¢ cells at inflammatory dis- 
ease sites has been a highly contentious subject* °. There is conside- 
rable interest in identifying pathways that control the stability of 
Treg cells as many immune-mediated diseases are characterized by 
either exacerbated or limited T,.g-cell function. Here we show that 
the immune-cell-expressed ligand semaphorin-4a (Sema4a) and 
the T,-,-cell-expressed receptor neuropilin-1 (Nrp1) interact both 
in vitro, to potentiate T,..-cell function and survival, and in vivo, at 
inflammatory sites. Using mice with a T,..-cell-restricted deletion 
of Nrp1, we show that Nrp1 is dispensable for suppression of auto- 
immunity and maintenance of immune homeostasis, but is required 
by Treg cells to limit anti-tumour immune responses and to cure 
established inflammatory colitis. Sema4a ligation of Nrp1 restrained 
Akt phosphorylation cellularly and at the immunologic synapse 
by phosphatase and tensin homologue (PTEN), which increased 
nuclear localization of the transcription factor Foxo3a. The Nrp1- 
induced transcriptome promoted T,¢,-cell stability by enhancing 
quiescence and survival factors while inhibiting programs that pro- 
mote differentiation. Importantly, this Nrp1-dependent molecular 
program is evident in intra-tumoral T,., cells. Our data support a 
model in which T,¢g-cell stability can be subverted in certain infla- 
mmatory sites, but is maintained by a Sema4a-Nrp1 axis, high- 
lighting this pathway as a potential therapeutic target that could 
limit T,.¢-cell-mediated tumour-induced tolerance without indu- 
cing autoimmunity. 

It was suggested initially that T,..-cell-mediated suppression was 
contact-dependent, and soluble factors were insufficient, as purified 
Treg Cells failed to suppress across a permeable transwell membrane*”’. 
However, we have shown previously that additional signals, derived 
from co-cultured T cells, are required to potentiate maximal T,.g-cell 
suppression by soluble factors across a transwell membrane’. Indeed, 
Treg Cells stimulated in the presence of fixed or live CD4* conventional 
T cells (Tcony cells) in the top chamber of a transwell plate can suppress 
Tony Cells in the bottom chamber (Supplementary Fig. 1a). To determine 
the T.ony-cell-derived signals responsible for potentiating or boosting 
Treg-cell-mediated suppression, we compared the T-ony- and Tyeg-cell 
transcriptomes and identified Sema4a as a potential ligand that could 
boost T,-g-cell function (Supplementary Fig. 1b-d). Sema4a is involved 
in neural patterning and has been implicated in modulating immune 
function, with haematopoietic cell expression restricted to CD4~ and 
CD8° T cells, natural killer cells and dendritic cells’. Using RNA inter- 
ference, loss-variant selection and overexpression in hybridomas, and 
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Figure 1 | Sema4a binds Nrp1 to potentiate T,.,-cell function and survival 
in vitro. a, Transwell suppression assay in which T,¢g cells were co-cultured in 
the top chamber of a transwell plate with anti-CD3- and anti-CD28-coated 
beads in the presence or absence of CD4* or CD8* Tony cells that had been 
transfected previously with scrambled or siRNA to Sema4a. Proliferation of 
Toon cells stimulated with anti-CD3- and anti-CD28-coated beads in the bottom 
chambers was measured by [°H]-thymidine uptake. b, Transwell suppression 
assay with T,¢g cells co-cultured in top chamber with CD4*, CD8* or CD11c* 
cells including anti-Sema4a or its isotype control. c, Transwell suppression 
assay in which T,<¢ cells were co-cultured in the absence of Tony cells but in the 
presence of beads coated with Sema4a-IgG1 or its isotype control. d, ELISA- 
based binding assay in which plates coated with Nrp1 were incubated with 
Sema4a-IgG1 or its isotype control in the presence of various blocking antibodies. 
Sema4a-IgG1 was detected using an isotype specific antibody. e, Transwell 
suppression assay in which T,.g cells were purified by flow cytometry from 
Foxp3” or Nrp1'‘Foxp3“ mice. f, Annexin V-7-AAD staining of Tyeg cells 
stimulated for 48 h in vitro in the presence of Sema4a-IgG1 or its isotype 
control. Results represent the mean of five independent experiments. *P < 0.05, 
**P < 0.01, ***P < 0.001 by unpaired t-test. Error bars indicate s.e.m. 


1Department of Immunology, St Jude Children’s Research Hospital, Memphis, Tennessee 38105, USA. “Integrated Biomedical Sciences Program, University of Tennessee Health Science Center, Memphis, 
Tennessee 38163, USA. ?Department of Pathology, St Jude Children’s Research Hospital, Memphis, Tennessee 38105, USA. “Computational Biology, St Jude Children’s Research Hospital, Memphis, 


Tennessee 38105, USA. °R&D Systems Inc., Minneapolis, Minnesota 55413, USA. 
*These authors contributed equally to this work. 


252 | NATURE | VOL 501 | 12 SEPTEMBER 2013 


©2013 Macmillan Publishers Limited. All rights reserved 


antibody blockade, we confirmed that Sema4a was necessary for Tcony 
and dendritic cells to potentiate T,.,-cell-mediated transwell suppres- 
sion (Fig. 1a, b and Supplementary Fig. 2a—e). Sema4a is also sufficient 
to potentiate suppression, as Teg cells cultured in the presence of beads 
coated with a Sema4a-immunoglobulin-G1 (IgG1) fusion protein, instead 
of fixed Tony cells, potentiated transwell suppression (Fig. 1c). In con- 
trast, soluble Sema4a-IgG1 inhibited T..,y-cell-potentiated, T,eg-cell- 
mediated transwell suppression (Supplementary Fig. 2f). These data 
demonstrate that Sema4a is required and sufficient to potentiate T,..- 
cell function by soluble factors in vitro. 

Nrp1 isa homogeneously expressed marker of thymically derived T,,.g 
cells, and has also been shown to bind Sema3a, which shares homology 
with Semad4a, and vascular endothelial growth factor (VEGF) in medi- 
ating neural axon growth and angiogenesis (Supplementary Fig. 3a)'°’. 
We therefore proposed that Nrp1 functions as a receptor for Sema4a 
on Tyeg cells. Sema4a could directly and specifically bind to Nrp1, albeit 
with slightly lower avidity to Sema3a, as demonstrated in an enzyme- 
linked immunosorbant assay (ELISA)-based binding assay with recom- 
binant Nrp1 and Sema family proteins and blocking antibodies to Nrp1 
or Sema4a (Fig. 1d and Supplementary Fig. 3b-d). We bred mice har- 
bouring a loxP-flanked (‘floxed’) Nrp1 gene (Nrp 1") to mice expres- 
sing a yellow fluorescent protein (YFP)-codon-improved Cre (iCre) 
fusion protein driven by an IRES following Foxp3 (Foxp3*''“"*, which 
we refer to as Foxp3") to obtain a T,eg-cell-restricted deletion of Nrp1 
(Foxp3"'?" x Nrp1'", which we refer to as Nrp1‘"Foxp3“”). Sema4a- 
IgG1 also bound to control but not Nrp1-deficient T,.g cells in a flow 
cytometric assay (Supplementary Fig. 3e). These data support a direct 
interaction between Nrp1 and Sema4a. 

Although Nrp1-deficient T,., cells develop normally and can sup- 
press ina ‘classical’ contact-dependent suppression assay (Supplementary 
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Fig. 4a), they were unable to suppress across a transwell membrane in 
the presence of either Tony cells or Sema4a-IgG1-coated beads (Fig. le). 
Transwell suppression can be blocked by anti-Nrp1 and, as reported 
previously, is dependent on interleukin-10 (IL-10) and IL-35 (Sup- 
plementary Fig. 4b-d)*. Sema4a-Nrp1 ligation substantially reduced 
cell death and modestly increased proliferation, and, although single- 
cell levels of IL-10 and IL-35 did not change, increased survival resul- 
ted in greater amounts of IL-10 and IL-35 in Sema4a-treated T,.,-cell 
cultures (Fig. 1fand Supplementary Fig. 4e-i). Expression of NRP1 on 
human Tyg cells has been a contentious subject'*"’. We observed sus- 
tained NRP1 expression, albeit modest, on optimally suppressive T,eg 
cells that could mediate transwell suppression in response to human 
SEMA4A in a NRP1-dependent manner (Supplementary Fig. 5). 
Taken together, these data suggest Sema4a—Nrp1 interaction promotes 
Treg-Cell survival and function. 

Nrp1''Foxp3“” mice did not show any autoimmune phenotype for 
at least 16 months after birth (data not shown). In addition, the capa- 
city of Foxp3©” and Nrp1"'Foxp3~” Treg cells to limit the development 
of the autoimmune sequelae caused by Foxp3 deletion was comparable 
(Supplementary Fig. 6), suggesting that maintenance of immune home- 
ostasis and prevention of autoimmunity may not require Nrp1 signalling. 

We reasoned that Nrp1 signalling may regulate T,.,-cell function 
under inflammatory conditions. T,eg cells are recruited to and are indu- 
ced by tumour cells, which consequently hamper protective anti-tumour 
immunity’. Foxp3?"*°"” mice, which allow for conditional Tyeg-Cell 
deletion following diphtheria toxin treatment”’, can clear MC38 ade- 
nocarcinoma, EL4 thymoma, and B16 melanoma tumours when treated 
with diphtheria toxin at the time of inoculation, although they invariably 
succumb to autoimmune disease (Fig. 2a—c). Notably, Nrp1'"Foxp3\” 
mice showed reduced, delayed tumour growth and increased survival, 
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particularly with B16 melanoma, without any detectable autoimmune 
consequences (Fig. 2a—c and data not shown). In wild-type C57BL/6 
mice, blockade of this pathway using Sema4a monoclonal antibody, 
Nrp1 monoclonal antibody (which does not block Nrp1-VEGF inter- 
action), and Sema4a-IgG1 (which acts as a soluble antagonist), sig- 
nificantly decreased tumour growth (Fig. 2d-f and Supplementary 
Fig. 2f, 3b, c). Nrp1-deficient T, 2g cells also failed to suppress clearance 
of B16 lung metastases, even with very high tumour cell inoculates (Fig. 2g 
and Supplementary Fig. 7a). Nrp1'"Foxp3“” mice displayed increased 
intratumoral CD8*~ T cells, particularly in the IFN-y* IL-2° TNF-a 
tumoricidal subset (Fig. 2h)’*. Although we originally presumed the 
source of Sema4a would be tumour-infiltrating T cells and conven- 
tional dendritic cells, the majority of Sema4a’ cells infiltrating tumours 
were plasmacytoid dendritic cells (57.4% of intratumoral Semad4a* cells), 
consistent with previous suggestions highlighting mechanistic links 
between Teg cells and plasmacytoid dendritic cells in mediating tumour- 
induced tolerance (Fig. 2i and Supplementary Fig. 7b, c)'°. Indeed, 
plasmacytoid dendritic cells can potentiate T,..-cell function in vitro 
in a Sema4a—Nrp1-dependent manner (Supplementary Fig. 7d). Treatment 
with Sema4a-IgG]1 also resulted in an increased number of intratumoral 
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CD8* T cells, consistent with observations with Nrp1'‘Foxp3“” mice 
(Supplementary Fig. 7e). Nrp1-deficient T,., cells also fail to cure 
established inflammatory colitis, suggesting that the utilization of this 
pathway is not restricted to the tumour microenvironment (Sup- 
plementary Fig. 8). Thus, although Nrp1 seems to be dispensable for 
regulating immune homeostasis, it is required for maximal T,g-cell- 
mediated control of inflammatory environments. 

We next sought to determine the signalling pathway downstream of 
Nrp1. Given the importance of limiting Akt-serine/threonine-protein 
kinase mTOR signalling in T,.,-cell function, and previous suggestions 
that Nrp1 modulates Akt signalling, we proposed that Nrp1 inhibits 
Akt function in Tyeg cells” ”. Indeed, T-cell receptor (TCR)- and CD28- 
activation-induced whole-cell Akt-mTOR signalling (as determined 
by phosphorylation of Akt and S6K1) was reduced to baseline levels in 
freshly isolated T,.g cells (as well as 47-54% in IL-2-expanded Teg 
cells) by Sema4a-mediated Nrp1 ligation (Supplementary Fig. 9a, b). 
Furthermore, T,,g-cell activation on stimulatory lipid bilayers con- 
taining Sema4a-IgG1, but not an isotype control, recruited Nrp1 to 
the immunologic synapse and inhibited immunologic synapse Akt 
phosphorylation, while sparing global immunologic synapse tyrosine 
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phosphorylation (Fig. 3a and Supplementary Fig. 9c, d). Retroviral 
overexpression of a dominant-negative Akt mutant” in Teg cells expan- 
ded with IL-2 limited the requirement for Nrp1 ligation, at least in 
transwell suppression assays, suggesting that Akt may be a dominant 
pathway that limits T,..-cell transwell suppression (Fig. 3b and Sup- 
plementary Fig. 9e). As immunologic synapse phosphorylated-Akt 
diminution in response to Sema4a was rapid, we reasoned that Nrp1 
recruits the phosphatase PTEN to the immunologic synapse, restrain- 
ing Akt-mTOR signalling’. Nrp1 constitutively bound PTEN in rest- 
ing Tyeg cells, which was reduced upon activation but maintained in 
the presence of Sema4a-Nrp1 interaction (Fig. 3c). PTEN-deficient 
Tyeg cells failed to inhibit Akt phosphorylation at the immunologic syna- 
pse in the presence of Sema4a, and failed to suppress across a transwell 
membrane in response to Sema4a- or Tony-cell-mediated potentiation 
(Fig. 3d and Supplementary Fig. 10a, b). Nrp1 has a small cytoplas- 
mic domain consisting of an evolutionarily conserved PDZ domain- 
binding motif (carboxy-terminal amino acid sequence: Ser-Glu-Ala)*. 
Nrp1 mutants lacking this motif could not inhibit Akt phosphorylation 
at the immunologic synapse or recruit PTEN (Fig. 3e and Supplemen- 
tary Fig. 10c, d). Activated Akt can phosphorylate Foxo transcription 
factors, promoting their exclusion from the nucleus”. Foxo transcrip- 
tion factors are critical for T,..-cell development, through interaction 
with Foxp3 as well as inducing several T,.g-cell signature genes”. 
Indeed, Sema4a-IgG1 limited the activation-induced nuclear export of 
Foxo3a (Fig. 3f). We propose that, during activation, Nrp1 ligation 
restrains Akt phosphorylation through PTEN, facilitates Foxo nuclear 
localization, and thereby potentiates T,..-cell function. 
Gene-expression analysis revealed an Nrp1-induced transcriptional 
profile that was consistent with promoting T,.,-cell survival, stability 
and quiescence, and that was similar to a Foxo-dependent transcrip- 
tional signature (Fig. 3g)”° **. Gene Ontology and Gene Set Enrichment 
Analysis revealed that Nrp1 ligation modulated multiple pathways and 
programs, including the IL-2- and IL-7-related transcriptional pro- 
grams, repression of cytokine transcripts and modulated Foxp3 tar- 
gets (Fig. 3g, Supplementary Fig. 11-13, and Supplementary Tables 1 
and 2). Of particular interest was the stabilization of the transcription 
factor Kruppel-like Factor 2 (KLF2) (and its targets Sell, Ccr7, II7r), the 
Treg-cell regulator Helios (Ikzf2; also known as IKAROS family zinc 


finger 2), and the anti-apoptotic protein Bcl2, accompanied by a con- 
comitant repression of the lineage defining transcription factors Eomes, 
IRF4 and RORyt (nuclear receptor ROR-7t, encoded by Rorc), sugges- 
ting a role for Nrp1 in stabilizing the T,..-cell program and repressing 
terminal differentiation (Fig. 3g and Supplementary Fig. 13b-d). 
Finally, we sought to determine whether the molecular fingerprints 
of Nrp] signalling could be observed in vivo. Tumour-infiltrating Tyeg- 
cells restrained Akt phosphorylation in an Nrpl-dependent manner 
(Fig. 4a). Likewise, tumour-infiltrating Teg cells showed Nrp1-dependent 
upregulation of Helios and downregulation of IRF4 and ROR}t (Fig. 4b 
and Supplementary Fig. 14a, b). This was associated with an increase 
in intratumoral T,.-cell proliferation, as revealed by Ki67-BrdU (5- 
bromodeoxyuridine) staining, reduced caspase-3-dependent program- 
med cell death, and increased expression of the anti-apoptotic protein 
Bcl2 (Fig. 4c—e and Supplementary Fig. 14c-e). We also observed an 
Nrp1-dependent increase in the percentage of ICOS*, IL-10* and 
CD73* intratumoral Treg Cells (Fig. 4f-h and Supplementary Fig. 14f-h). 
Taken together, our data support a role for Nrp1 in modulating Tyeg- 
cell stability, survival and function in certain tumour microenvironments. 
Our data demonstrate that cell contact-dependent potentiation of 
Tyeg-cell stability and function is mediated by Sema4a-Nrp1 ligation 
through a PTEN-Akt-Foxo axis (Supplementary Fig. 15). This pathway 
enhances T,¢g-cell function indirectly by enforcing stability and pro- 
moting survival, and this is most evident in inflammatory sites such as 
certain tumours and colitic intestinal mucosa. However, Nrp1 signal- 
ling may also boost T,.,-cell function directly by enhancing some sup- 
pressive mechanisms (for example, CD73). Apart from haematopoietic 
lineages, Sema4a seems to have a pattern of expression consistent with 
sites in which T,g-cell tolerogenic activity would be desired, such as 
the nervous system, eye and intestine”. The issue of T,eg-cell stability 
has been highly contentious, and the mechanisms that maintain T,.,- 
cell stability remain elusive**. As Foxo family members enhance Foxp3 
function and promote T,eg-cell homeostasis and function”, it is intri- 
guing that Nrp1 signalling may counteract the negative impact of Akt 
on Foxo nuclear localization, resulting in substantial overlap between 
the transcriptional profiles induced by Foxo and Nrp]1 signalling”. It is 
possible that the Nrp1-Sema4a pathway may be perturbed genetically 
or under certain pathologic circumstances; this could also provide a 
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basis for the seemingly contradictory perceptions of T,..-cell stability 
in a variety of normal and diseased states. 

Previous studies have shown that plasmacytoid dendritic cells pro- 
mote tolerance, Treg cell differentiation and function, and their ablation 
has been shown to correlate with enhanced antitumour immunity’”””. 
Given that the dominant intratumoral source of Sema4a was plasma- 
cytoid dendritic cells, this raises the possibility that Nrp1-induced T,eg- 
cell stability and survival provides a mechanistic explanation for these 
observations. As Tyeg cells represent a major barrier to anti-tumour 
immunity in many cancers, a clinically relevant and critical question is 
whether it is possible to limit T,..-cell function in tumours while pre- 
venting inflammatory or autoimmune adverse events. Recently, a role 
for Nrp1 in T, 2g cells was proposed to limit tumour growth in the MT/ 
ret murine melanoma model”, although contrary to the findings in 
that study we did not observe any differences in T,eg-cell prevalence 
in B16 melanoma tumours (Fig. 2h). However, further studies will be 
required to delineate in which tumours and under what conditions 
these two disparate functions of Nrp1 in T, eg cells—regulation of T,.,- 
cell migration and the maintenance of T,eg-cell survival and stability— 
are used. Our identification of Nrp1-Sema4a as a pivotal pathway required 
for intratumoral T;eg-cell stability, but dispensable for the mainten- 
ance of immune homeostasis, suggests that Sema4a—Nrp1 blockade via 
antibodies or soluble antagonists may be a viable therapeutic strategy 
to limit tumour-induced tolerance without evoking autoimmunity. 


METHODS SUMMARY 

Mice. C57/BL6 and dnTGF@RII mice were purchased from the Jackson Labo- 
ratories. Foxp3*"? a Foxp3” and Foxp3” TR-GEP mice were obtained from A.Y. 
Rudensky. 1110” mice were obtained from T. Geiger. Nrp1' mice were obtained 
from D. Cheresh. Pten' x Foxp3’"?? mice were obtained from H. Chi. Animal 
experiments were performed in American Association for the Accreditation of 
Laboratory Animal Care-accredited, specific-pathogen-free facilities in the St Jude 
Animal Resource Center. Animal protocols were approved by the St Jude Animal 
Care and Use Committee. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Mice. C57/BL6 and dnTGF&RII mice were purchased from the Jackson Laboratories. 
Foxp3*'?', Foxp3” and Foxp3?'® ©"? mice were obtained from A.Y. Rudensky. 
1110” mice were obtained from T. Geiger. Nrp1' mice were obtained from D. 
Cheresh. Pten'' x Foxp3*'? C” mice were obtained from H. Chi. Animal experi- 
ments were performed in American Association for the Accreditation of Labora- 
tory Animal Care-accredited, specific-pathogen-free facilities in the St Jude Animal 
Resource Center. Animal protocols were approved by the St Jude Animal Care and 
Use Committee. 

Antibodies. Sema4a staining antibody was purchased from MBL (clone 5E3), and 
conjugated to biotin or Alexa Fluor 647 in-house. Polyclonal anti-Nrp1 was pur- 
chased from R&D Systems (AF566). Monoclonal antibodies were obtained from 
R&D Systems (Sema4a, 757129; Nrp1, 761704, MAB59941). Most flow cytometric 
antibodies were purchased from BioLegend. Anti-Foxp3 and anti-Eomes were pur- 
chased from eBioscience. KLF2 antibody was purchased from Millipore. Phospho- 
Akt (Ser473), phospho-S6K1 (Thr421-Ser424), Foxo3a, and pan Akt antibodies 
were purchased from Cell Signaling Technologies. PTEN-HRP (horseradish per- 
oxidase) antibody was purchased from Santa Cruz Biotechnology. 

RNA interference. Control siRNA (catalogue no. 4390843) and pools of Sema4a 
(catalogue no. 4390771, siRNA no. s73547) siRNA were purchased from Life Tech- 
nologies and resuspended per the manufacturer’s instructions. CD4* and CD8* 
conventional T cells were sorted magnetically by negative selection and transfected 
by Amaxa (Lonza) with 300 pMol siRNA and 2 lg of pMaxGFP control plasmid, 
rested overnight in Amaxa nucleofector media. Cells were then sorted based on 
GFP, CD25, and CD45RB expression and co-cultured with Tg cells in the top well 
of a transwell suppression assay. 

Plasmids. Nrp1.mCherry was obtained from Addgene and used as a template to 
generate retroviral overexpresion constructs. Nrp1””" was generated by adding the 
native signal sequence and cloned into pMICherry (MSCV-driven retroviral con- 
struct with an IRES-driven mCherry gene). Nrp1“*?” was generated from the WT 
construct, deleting the terminal SEA motif by mutation of the serine codon to a 
stop codon. Akt’, Akt? (dominant-negative kinase dead K179M; described 
previously*’), and pBabe empty vector were obtained from D. R. Green. 

Human T-cell populations. Human umbilical cord samples were provided by B. 
Triplett, M. Howard and M. McKenna at the St Louis Cord Blood Bank, and were 
obtained from the umbilical vein immediately after vaginal delivery with the infor- 
med consent of the mother and approved by St Louis Cord Blood Bank Institu- 
tional Review Board (IRB). Research use was approved by the St Jude Children’s 
Research Hospital IRB. 

Transwell suppression. 1.25 X 10* Treg Cells purified flow cytometrically (CD4* 
CD45RB" Foxp3"!?°*) were stimulated in the top chamber of a Millipore Millicell 
96 (0.4m pore size) in the presence of flow cytometrically purified Ton, cells 
(CD45RB% CD25" CD4" or CD8"*), B cells (B220*), or Treg Cells at a 1:4 ratio, 
Sema4a-IgG1- or IgG-conjugated latex beads (1:1 ratio), anti-CD3¢ (145.2C11) 
and anti-CD28 (37.51) conjugated latex beads (1:1 ratio), and/or neutralizing anti- 
bodies. In some experiments, the top well co-cultured cells were fixed with 2% 
paraformaldehyde for 15 min and washed extensively before co-culture with Treg 
cells. Purified Tn, cells (5 X 10*) were stimulated in the bottom well with anti- 
CD3- and anti-CD28-coated beads at a 1:1 ratio. Cells were cultured for 72 h and 
pulsed with *[H]-thymidine for the final 8 h. The bottom chambers were harvested 
and read with a beta counter. 

For human studies, flow cytometrically purified umbilical cord blood Tony cells 
(CD4* CD25") and Treg cells (CD4*CD25*) were activated with 3 ug ml! plate- 
bound anti-CD3 (clone OKT3), 2 ug ml ~ soluble anti-CD28 (clone CD28.1), and 
100 units per ml rhIL-2 for 7-9 days. After collection and washing, T,.g cells were 
stimulated at a 1:10 ratio with fixed autologous T.on,-cell- or IgG- or Sema4a-IgG1- 
coated latex beads in the top well of a transwell plate. 2.5 X 10* Tzony cells were 
stimulated in the bottom well at a 1:1 ratio with OKT3- and CD28.1-coated latex 
beads. Cells were cultured for 5 dand pulsed with *[H]-thymidine for the final 8 h. 
The cells from the bottom chambers were collected and read with a beta counter. 
“Transwell suppression is defined as 100 - 100 X ((counts per min ofa particular well) / 
(Average counts per min of unsuppressed cells)) to normalize across experiments. 
Fusion proteins. The sequence encoding the extracellular domains of Sema4a and 
Nrp1 were cloned in-frame to pX-Ig to create a Sema4a- or Nrp1-mouse IgG1 
fusion protein construct (Sema4a-IgG1 and Nrp1-IgG1). J558L B cells were elec- 
troporated with this construct, and high-producing clones were selected by single- 
cell sorting. High-producing clones were seeded into Sartorious Bioreactors and 
collected for protein G purification and concentration. Sulphate latex 4-j1m beads 
(Life Technologies) were conjugated with isotype control (mouse IgG1, MOPC21, 
R&D Systems) or Sema4a-Ig overnight with 3 pg protein per bead, blocked with 
10% FBS, and stored in media. Mouse Sema-3a-Fc, Sema4a-Fc, mouse Nrp1, and 
human Sema4a-Fc was purchased from R&D Systems. 
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Binding assays. High protein binding ELISA plates were coated with 500 ng ml! 
recombinant murine Nrp1 (R&D Systems) overnight in PBS. After a 1-2-h block 
in 1% BSA in PBS at 4° in the dark, coated plates were incubated with various 
concentrations of Sema4a-IgG1 or mouse IgG1 for 2-4h in the presence of anti- 
Sema4a, anti-Nrpl, or isotype control antibodies. Plates were then washed with 
PBS plus 0.05% Tween-20 10 times and incubated with 500 ng ml biotinylated 
anti-mouse IgG1 antibody (BD Biosciences) to bind the fusion protein (or mouse 
IgG1 control). After 7 washes, Strepdavidin- HRP (GE Healthcare) was added at 
500 ng ml ' to detect the biotinylated antibody. After another 7 washes, TMB sub- 
strate (Thermo Scientific) was added and stopped with 1 N H,SO,. 

For VEGF binding, the same protocol was followed, except that rather than 
Sema4a-IgG1 being used, VEGF} 65 (R&D Systems) was used at 50 ng ml | in PBS 
and detected with 500 ngml' anti- VEGF-biotin (R&D Systems) followed by SA- 
HRP for detection. 

For comparisons across Sema family members, plates were coated with varying 
concentrations of Sema3a-Fc, Sema4d-Fc, Sema4a-IgG1, or isotype control over- 
night. Biotinylated Nrp1-IgG1 was added and incubated for 3 h, and SA-HRP was 
used for detection. 
mRNA analysis. RNA was extracted from cells lysed in TRIzol reagent (Life Tech- 
nologies) and reverse transcribed with the High Capacity Reverse Transcription 
kit (Applied Biosystems). Real-time polymerase chain reaction (PCR) was carried 
out using primers and probes and TaqMan master mix or SYBR green chemistry 
(Applied Biosystems). 

Rescue of Foxp3-deficient autoimmunity. CD45.1 x Foxp3‘/~ female mice were 
bred with CD45.1 male mice in timed breedings. Male progeny were genotyped at 
birth for Foxp3” status. Purified Foxp3© (1 X 10°) or Nrp1‘Foxp3- CD45.2* 
Treg cells, purified by flow cytometry, were injected intraperitoneally into Foxp3” 
male pups within 3 days of birth. Mice were monitored for the scurfy phenotype 
(scaly skin, eye inflammation, runted phenotype, and lack of mobility). For some 
experiments, all mice were killed at 5 weeks for histological analysis of the ear 
pinna, liver and lung. 

Tumour models. Foxp3°”, Nrp1'"Foxp3“, or Foxp3?™®°'? mice were injected 
with B16.F10 melanoma (1.25 X 10° cells intradermally), EL4 thymoma (1.25 X 10° 
cells intradermally), or MC38 colon carcinoma (2.5 X 10° cells subcutaneously). 
Tumours were measured regularly with digital calipers and tumour volumes were 
calculated; this was done blind but not randomized. Tumours and lymph nodes 
were collected for analysis. TILs were prepared using a Percoll gradient from 
tumour samples after mechanical disruption. For metastasis studies, B16.F10 was 
injected intravenously at various doses. After 17-20 days, lungs were harvested, 
inflated with H,O>, and metastases were counted. Therapeutic B16 experiments 
were conducted by injecting 1.25 X 10° B16 melanoma cells intradermally and 
waiting until tumours were palpable (5 days). On day 5, mice began to receive 
intraperitoneal injections of either rat IgG2a, or anti-Nrp1 (R&D Systems clone 
761704, MAB59941) (400 1g initial dose and 200 j1g every 3 days). Prophylactic 
experiments included anti-Sema4a (R&D Systems clone 757129) and Sema4a- 
IgG1 consisting of twice weekly injections of 100 j1g of protein starting on the day 
of tumour inoculation. To achieve reasonable power, at least 15 mice were used in 
each group, at least 5 mice per experiment. Additional mice were added to experi- 
ments as appropriate. 

Experimental colitis. Six- to eight-week-old RAG2 ‘~ mice were injected intra- 
peritoneally with 4 x 10° congenically marked CD45RB™ CD25" Teony cells. When 
the majority of the mice had lost 10% body weight and had colitis symptoms (21 to 
28 days later), 1X 10° Foxp3” or Nrp1'Foxp3“” Treg Cells were injected intra- 
peritoneally. Mice that did not lose 10% body weight at this time received no injec- 
tion and were excluded from analysis, and mice received randomized injections 
(different genotypes of transferred cells per cage). Body weight was measured daily 
ina blinded fashion, and 28 days after T,ep-cell rescue, sections were stained for his- 
tology. To achieve reasonable power, at least 15 mice were used in each group, and at 
least 5 mice per experiment. Additional mice were added to experiments as appropriate. 
Signalling analysis. For flow cytometry, Teg cells were stimulated with anti-CD3- 
and anti-CD28-coated beads and IgG1-coated or Sema4a-IgG1 -coated beads over- 
night, then fixed with 1% PFA for 15 min at 37 °C. Cells were then permeabilized 
in ice-cold 90% MeOH for 20 min at —20 °C. After extensive washing in PBS, cells 
were blocked with 10% normal mouse serum in PBS for 10 min at room tempera- 
ture. Cells were then stained with antibodies in 1% BSA in PBS (phosphorylated 
S6K1 (pS6K1) (Thr421-Ser424), pAkt (Ser473), or pAkt (Thr308)) for 1 h at room 
temperature, in the dark. Finally, cells were stained with appropriate secondary 
antibodies for 30 min at room temperature, in the dark, then washed and analysed. 
For immunoblot analysis, Treg cells were expanded with 1 ng ml! phorbol-13- 
myristol acetate and 10ngml ' ionomycin with 500 units recombinant human 
interleukin-2 (IL-2) for 3 days, then washed extensively with media, and expanded 
to 10X volume in 500 units recombinant human IL-2. After an overnight rest with 
no IL-2, Treg cells were stimulated with plate-bound anti-CD3, soluble anti-CD28, 
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and bead-bound Sema4a-IgG1 for 6h, and lysed in whole-cell lysis buffer (1% 
NP40,5mM EDTA, 5 mM EGTA, Tween-20) for 15 min on ice, then subjected to 
immunoblot analysis. In some experiments, 1-3 X 10° Treg cells were lysed in a 
larger volume, and cleared. Nrp1 was immunoprecipitated using a polyclonal 
anti-Nrp1 antibody (R&D Systems, AF566) 6-16 h followed by a 3h incubation 
with Protein G beads. Beads were washed with lysis buffer before elution. In brief, 
precipitates or input lysates were incubated at 96°C with 0.1 M dithiothreitol 
(DTT) and 4x LDS sample buffer (Life Technologies), then loaded into 4-12% 
Bis-Tris NuPAGE gels (Life Technologies), and run for 1 h at 200 V. Separated gels 
were electrotransferred to polyvinylidene difluoride (PVDF) membranes using the 
Criterion Gel Blotting System (Biorad), and blocked for 1 h at room temperature 
with 3% BSA in Tris-buffered saline (TBS) supplemented with 0.1% Tween-20. 
Blocked membranes were incubated overnight with anti-PTEN conjugated direc- 
tly to HRP (Santa Cruz Biotechnologies), washed three times with TBS-Tween, and 
imaged using Western Lightning ECL (enhanced chemiluminescence). For other 
immunoblot analysis, blocked membranes were incubated with varying primary 
antibodies (anti-Nrp1, anti-phosphorylated or total Akt, anti-phosphorylated S6K1) 
overnight, washed three times with TBS-Tween, and incubated with appropriate 
HRP-conjugated secondary antibody controls. After three additional washes mem- 
branes were imaged using Western Lightning ECL. 

Retroviral transduction. 293T cells were transfected with pPAM-EQ and pVSV- 
G packaging plasmids with various retroviral constructs to transduce GPE86 retro- 
viral producer cells. T;eg cells were purified flow cytometrically. T,eg cells were activated 
and cycled with PMA and ionomycin in the presence of 500 units per ml recom- 
binant human IL-2 for 24h in 96-well flat-bottom plates at 5 X 10* per well in 
100 pl. Viral supernatants were concentrated 10-fold using 100-kilodalton (kDa) 
MWCO concentrators (Millipore) and added in equal volume to cycling T,.g cells 
in the presence of 500 units per ml recombinant human IL-2 and 6 pg ml * poly- 
brene and centrifuged at 1,000g for 60 min at 37 °C, then incubated for 24h. The 
transduction process was repeated twice every 24h, removing 100 ul of superna- 
tant from the cultured T,.¢ cells each day to keep the culture volume at 200 jul per 
well. Treg cells were then washed in media and sorted based on fluorescent protein 
expression or selected with 1 ,1gml~' puromycin and expanded further in IL-2. 
Fluorescent protein or intracellular epitope staining (anti-HA, Sigma) was con- 
firmed before use. Functional assays were performed after a 24h rest without IL-2. 
Microscopy. TIRF illumination of immunologic synapse activation was performed 
as described previously’’. In brief, lipid bilayers containing anti-TCR and an anti- 
mouse IgG1 capture antibody loaded with Sema4a-IgG1 or isotype control were 
prepared. T,eg cells were stimulated on the bilayer for 20 min, then fixed, permea- 
bilized, and stained for phosphorylated Akt (Ser473), global phosphotyrosine 
(4G10), or Nrp1. The ‘percentage of pAkt* TCR clusters’ represents the ratio of 
synapses positive for phosphorylated Akt (Ser473) to the total number of synapses 
formed, as read-out by TCR clustering. 

Foxo3a was carried out on freshly isolated Tyeg cells that were left unstimulated 
in media overnight or stimulated with immobilized anti-CD3 and anti-CD28 in 
the presence or absence of immobilized Sema4a-IgG1 or its isotype control. Cells 
were collected, fixed in 1% PFA, and permeabilized with 0.1% Triton X-100 in TBS. 


After blocking with normal mouse serum, cells were stained with anti-Foxo3a (Cell 
Signaling Technologies) overnight in Tris-buffered 1% BSA. After several washes, 
cells were stained with Alexa Fluor 647 conjugated anti-rabbit IgG (Life Techno- 
logies), and then washed several times. Cells were then loaded with 4’ ,6-diamidino- 
2-phenylindole (DAPI) and phalloidin- Alexa Fluor 546 or 488 before microscopy. 
Random fields of 10 to 30 cells were visualized using spinning-disc laser-scanning 
confocal microscopy. Blinded masks were generated using phalloidin and DAPI 
staining to determine cytoplasmic and nuclear volume, respectively, and only then 
was the Foxo3a staining visualized. The nuclear and cytoplasmic volumes of Foxo3a 
fluorescence of 20 to 30 stacks were calculated using Slidebook (3i) software in 
arbitrary fluorescence units and analysed in Graphpad Prism. 

Affymetrix array and analysis. Foxp3“” or Nrp1'"Foxp3\” Treg cells were flow 
cytometrically sorted to 99.0% purity from 6- to 8-week-old mice, and stimulated 
for 48h with plate-bound anti-CD3, anti-CD28, 100 units per ml recombinant 
human IL-2, and either isotype or Sema4a-IgG1 -coated latex beads. Cells were col- 
lected, washed three times with PBS, and lysed in TRIzol reagent (Life Technolo- 
gies). Quality was confirmed by ultraviolet spectrophotometry and by analysis on 
an Agilent 2100 Bioanalyzer (Agilent Technologies). Total RNA (100 ng) was pro- 
cessed and labelled in the Hartwell Center for Biotechnology & Bioinformatics 
according to the Affymetrix 3’ IVT Express protocol and arrayed on a mouse high- 
throughput 430 PM GeneChip array. Signal data were RMA summarized, visua- 
lized, quality checked by principal components analysis (PCA) (Partek Genomics 
Suite 6.6). Batch correction was applied as needed to correct differences in com- 
pletely replicated experiments scanned on distinct dates. To compare Tony cells to 
resting Teg cells and unequal variance, a t-test was applied to each probe set and 
the log, ratio calculated. This same analysis was used to compare T cony Cells to acti- 
vated Tye, cells. To compare the effect of Sema4a treatment in wild-type T,cg cells 
to the effect of Sema treatment in Nrp1-deficient cells, a two-factor analysis of 
variance (ANOVA) interaction of treatment and genotype was applied to each 
probeset and the Storey q value was found to be correct for multiple comparisons. 
The categorical mean of each probe set was found, transformed to a Z-score, hie- 
rarchically clustered and visualized by heat-map in Spotfire DecisionSite 9.1 
(Tibco). The heat map in Fig. 5f was composed of the top named genes that had 
the passed P-value interaction FDR at 10%, had a minimum mean expression of 6 
in one class and a minimum absolute value logratio difference of at least 0.5. The 
volcano plots were generated using STATA/s.e. 11.1. For all volcano plots, genes 
without official symbols or names were removed. In these plots, score refers to the 
-log-base-10-transfomed P value. For the interaction volcano plot genes a metric 
for distance from the origin was applied to colour code the graph |(score/10 + |log 
ratio difference|)/2| > 0.5. Statistical tests and multiple comparison corrections 
were performed using Partek Genomics Suite 6.6. Sequences were retrieved for 
probe sets that showed at least a threefold difference between T.,,,, and activated 
Treg Cells. and a P value of 0.01 and these sequences were then tested with SignalP 
3.0 software to identify transmembrane domains. 
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Interactome map uncovers phosphatidylserine 
transport by oxysterol-binding proteins 


Kenji Maeda!, Kanchan Anand‘, Antonella Chiapparino, Arun Kumar’, Mattia Poletto!, Marko Kaksonen! & Anne-Claude Gavin! 


The internal organization of eukaryotic cells into functionally spe- 
cialized, membrane-delimited organelles of unique composition 
implies a need for active, regulated lipid transport. Phosphatidyl- 
serine (PS), for example, is synthesized in the endoplasmic reticu- 
lum and then preferentially associates—through mechanisms not 
fully elucidated—with the inner leaflet of the plasma membrane’. 
Lipids can travel via transport vesicles. Alternatively, several protein 
families known as lipid-transfer proteins (LTPs) can extract a variety 
of specific lipids from biological membranes and transport them, 
within a hydrophobic pocket, through aqueous phases*”. Here we 
report the development of an integrated approach that combines 
protein fractionation and lipidomics to characterize the LTP-lipid 
complexes formed in vivo. We applied the procedure to 13 LTPs in 
the yeast Saccharomyces cerevisiae: the six Secl4 homology (Sfh) 
proteins and the seven oxysterol-binding homology (Osh) proteins. 
We found that Osh6 and Osh7 have an unexpected specificity for PS. 
In vivo, they participate in PS homeostasis and the transport of this 
lipid to the plasma membrane. The structure of Osh6 bound to PS 
reveals unique features that are conserved among other metazoan 
oxysterol-binding proteins (OSBPs) and are required for PS recog- 
nition. Our findings represent the first direct evidence, to our 
knowledge, for the non-vesicular transfer of PS from its site of 
biosynthesis (the endoplasmic reticulum) to its site of biological 
activity (the plasma membrane). We describe a new subfamily of 
OSBPs, including human ORP5 and ORP10, that transfer PS and 
propose new mechanisms of action for a protein family that is 
involved in several human pathologies such as cancer, dyslipidae- 
mia and metabolic syndrome. 

To determine LTP specificity, we purified LTP-lipid complexes 
from extracts of yeast strains expressing physiological amounts of 
LTPs fused to the tandem-affinity purification (TAP) tag (Supplemen- 
tary Fig. la)*°. This approach combines high-affinity purification 
using an immunoglobin G (IgG) resin, elution with a site-specific 
protease, and separation by size-exclusion chromatography (SEC). 
Proteins and lipids in the SEC fractions were then analysed by dena- 
turing gel electrophoresis (SDS-PAGE) and high performance thin 
layer chromatography (HPTLC) or mass spectrometry, respectively 
(Supplementary Figs 2-4). The SEC elution profiles discriminate any 
background binding to IgG and only lipids co-eluting with LTPs were 
considered to represent specific ligands. Importantly, the procedure 
allows LTPs to be maintained in conditions that closely approximate 
normal physiology and preserves their ability to form multiprotein 
complexes. For example, Secl4 was found to co-elute with a known 
interaction partner Ptc7 (systematic name Yhr076w), a 38-kDa type 
2C protein phosphatase (Supplementary Fig. 3). The analysis con- 
firmed many other known or predicted interactions, such as the ones 
between Kes1 (also known as Osh4, systematic name Ypl145c) and 
sterol’® and between Sec14 (systematic name Ymr079w) or Sfh1 (sys- 
tematic name Yk1091c; 64% primary sequence identity with Sec14) and 
phosphatidylinositol (PI)/phosphatidylcholine (PC)'' (Supplementary 
Figs 1b, 2 and 3). 


Among proteins with an OSBP domain, Osh6 (systematic name 
Ykr003w) and Osh7 (systematic name Yhr001w)—two paralogues that 
have poor sterol-transfer activity in vitro'’"-formed stoichiometric and 
specific complexes with PS but not with sterols (Supplementary Figs 1c, 
2. and 4). For Osh6, the amount of PS in the pull-downs was ~100-fold 
higher than in the controls Kes1 and Sec14, no other lipids co-eluted at 
significant levels and the protein only associated with PS species with 
long aliphatic chains (C34) (Supplementary Figs 1d and 4). Osh6 and 
Osh7 seem to participate in lipid metabolic pathways in vivo that are 
distinct from those of other OSBPs. Kes] fused to green fluorescent 
protein (GFP) localizes to the cytosol’*, whereas Osh6-GFP resides in 
structures at the cell periphery that overlap with cortical endoplasmic 
reticulum (ER) and represent ER-plasma membrane (PM) contact 
sites'*(Supplementary Fig. 5a). Upon perturbations of either ergosterol 
or PI phosphate metabolism, Kes1-GFP translocated to juxtanuclear 
patches”’, whereas only mutations targeting PS metabolism specifically 
affected Osh6-GFP localization at the cell periphery (Supplementary 
Fig. 5b). These data indicate a new role for the OSBP family in the 
transport of PS, an important signalling lipid. This forms the basis for 
the more detailed mechanistic and structural studies discussed below. 

To confirm the interaction between Osh6 and PS, we produced 
Osh6 in Escherichia coli. The recombinant Osh6 co-purified with 
phosphatidylglycerol (PG) and phosphatidylethanolamine (PE), two 
abundant lipids in bacteria, and with PS (Supplementary Fig. 6a). 
These complexes were resistant to exposure to a non-ionic detergent, 
indicating that the ligands are poorly accessible. We measured Osh6 
lipid-binding specificity in vitro using liposomes containing PS, PG, 
PE, PI, cardiolipin (CA), ergosterol or mixtures of lipids originating 
from yeast membranes (in which PG exists in trace amounts"). We 
found that only PS bound efficiently to recombinant Osh6 and that PS 
could replace bacterially produced and co-purifying PG and PE 
(Fig. 1a, b). As a control in the same assay, recombinant Kes] bound 
ergosterol, but not PS. This shows that in vitro, Osh6 specifically binds 
PS and exchanges lipids with membrane bilayers. 

We then determined whether Osh6/Osh7 can transfer PS between 
membranes in vitro. For this purpose, we used either recombinant 
Osh6 (Fig. 1c and Supplementary Fig. 6b) or native Osh6-TAP and 
Osh7-TAP (Fig. 1d) purified from yeast. We designed two different 
lipid-transfer assays. First, we monitored the transfer of lipids from 
‘heavy’ donor liposomes (that contained different putative lipid car- 
goes) to ‘light’ acceptor liposomes (Fig. 1c, d). The donor liposomes 
were filled with sucrose so that they could readily be separated by 
centrifugation’. In the second assay, we measured the amount of PS 
transferred from donor to immobilized and fluorescently labelled (green) 
acceptor liposomes. We visualized the presence of PS in the acceptor 
liposome with a specific probe, the fluorescently labelled annexin V 
(red, Cy3)(Supplementary Fig. 6b). Overall, the data demonstrate that 
in vitro both Osh6 and Osh7, but neither Kes1 nor Sec14, specifically 
transfer PS between donor and acceptor membranes. 

Osh6 and Osh7 localize at membrane contact sites that bridge 
PS synthesis (the ER) with areas of PS accumulation and biological 
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Figure 1 | Recombinant Osh6/Osh7 bind and specifically transfer PS in 
vitro. a, Oshé6 specifically associates with phosphatidylserine (PS) and 
phosphatidylglycerol (PG). Osh6-associated lipids were determined by 
HPTLC. Liposomes (2% of the input) were loaded as references. b, Osh6 
extracts PS from liposomes containing yeast total lipid extract. Kes1 associates 
with ergosterol (Erg). Control: without lipid-transfer proteins (LTPs). 

c, Recombinant Osh6 specifically transfers PS from heavy donor liposomes to 
light acceptor liposomes. d, Transfer of PS from the donor to acceptor 
liposomes by Osh6-TAP and Osh7-TAP purified from yeast. CA, cardiolipin; 
PE, phosphatidylethanolamine; PI, phosphatidylinositol. Error bars represent 
standard deviation, n= 3. 


activity (the PM)'*(Supplementary Figs 5a and 7a). We first examined 
whether Oshé6 and Osh7 contribute in vivo to the subcellular compart- 
mentalization of PS by measuring the content of the lipid in different 
subcellular membranes (Fig. 2a and Supplementary Fig. 7a). In wild- 
type cells, PS is more concentrated in the PM- than in the ER-enriched 
fractions. By contrast, in the absence of Osh6/Osh7, PS accumulation 
at the PM was reduced (~30%), whereas the distribution of other 
glycerophospholipids was largely unaffected. We also performed addi- 
tional tests using an in vivo probe for cellular PS: the lactadherin C2 
domain fused to GFP (Lact C2-GFP)'. Consistent with the biochem- 
ical data, PS predominantly localized at the PM in the unperturbed, 
osh6A, osh7A and kes1A (control) strains’, where it laterally par- 
titioned into domains (Fig. 2b and Supplementary Fig. 7a). Deletion 
of both Oshé6 and Osh7 induced PS delocalization from the PM and the 
Lact C2-GFP probe adhered to intracellular membrane structures repre- 
senting the ER and vacuoles (Fig. 2b and Supplementary Fig. 7a, b). PS 
mislocalization was rescued by the reintroduction of wild-type Osh6, 
but not the mutants Osh6 Lys126Ala or Leu69Asp that are deficient in 
PS binding (see below; Supplementary Fig. 7c) or ofan engineered Osh6 
that localizes at vacuolar membranes (see below; Supplementary Fig. 7d). 
Collectively, these results indicate that in vivo, active Osh6/Osh7 at the 
ER-PM contact sites contribute to the delivery and accumulation of 
PS at the PM. 

PS is also transported from the ER to sites of PS decarboxylation, 
where PS serves as a precursor for the biosynthesis of PE and PC 
(Fig. 2c). If Osh6 and Osh7 contribute to PS decarboxylation, then 
their deletions should lead to a drop in the cellular levels of PE/PC. 
However, in osh6Aosh7A cells, the total amounts of PE/PC were 
largely unaffected (Fig. 2a). We observed instead a significant decrease 
in PS levels that could be reversed by the deletion of the main decar- 
boxylase in yeast, the mitochondrial Psd1 or the addition of choline 
(a precursor for the Kennedy pathway, that is, an alternative, PS- 
independent route of PC synthesis). These results disprove a major 
role for Osh6/Osh7 in the delivery of PS to the main sites of decarboxy- 
lation in yeast’®. This is also consistent with the notion that Osh6/Osh7 
contribute to the accumulation of PS at the PM so that it avoids 
decarboxylation in the mitochondria. Our data support the view that 
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there are distinct routes for regulated PS delivery to its different sites 
of biological activity (Fig. 2c). Cellular processes implying intracellu- 
lar PS transport, that is, PS decarboxylation into PE (mitochondria) 
or the vesicular trafficking-, cell cycle-dependent polarization of PS 
at the buds*"*, were largely unaffected in osh6Aosh7A strains (data 
not shown). 

We then investigated whether Osh6/Osh7 directly transport newly 
synthesized PS between cellular membranes in vivo. We first depleted 
endogenous PS by deleting the PS synthase (Chol, systematic name 
Yer026c). Lyso-PS (derivatives of phosphatidylserine in which one 
acyl chain have been removed by hydrolysis) subsequently added to 
the media was incorporated into the cells and was rapidly converted 
to PS in the ER (wild-type cellular levels of PS were restored within 
10 min; Supplementary Fig. 8a)'*'”. We were then able to measure the 
kinetics of Osh6/Osh7-mediated transfer of newly synthesized PS 
using the Lact C2-GFP probe’. In the absence of Osh6/Osh7, or in 
the presence of a mutated Oshé deficient in PS binding (Leu69Asp; see 
below), newly formed PS remained trapped in the ER (Fig. 2d and 
Supplementary Fig. 8b, c). By contrast, functional Osh6/Osh7 trig- 
gered a rapid (within 2 min) translocation of PS from the ER to the 
PM. The rate at which newly synthesized PS equilibrates in the PM was 
slower in the Osh6/Osh7 double mutant than in wild-type cells 
(Supplementary Fig. 8b). Osh6/Osh7-mediated PS transfer took place 
at 4 °C, further indicating that PS transport is largely independent of 
vesicular trafficking. In osh64osh7A strains at 4 °C, a small fraction of 
the Lact C2-GFP probe remained bound to the PM, indicating that 
additional, Osh6/Osh7-independent, non-vesicular mechanisms of PS 
transport may exist. Finally, we also examined whether the transloca- 
tion of Oshé to other cellular membranes is sufficient to redirect PS 
to this new location (Fig. 2e, f). In an Osh7-knockout strain, we co- 
expressed Osh6 fused to human FKBP (FK506-binding protein) and 
Vph1 (a trans-membrane, vacuolar protein) fused to human FRB 
(FKBP-rapamycin-binding domain). Rapamycin induced the reloca- 
lization of Osh6-FKBP to vacuoles, particularly to regions adjacent to 
the ER (Fig. 2e). Upon Osh6 translocation, newly synthesized PS also 
became quickly (within 5 min) redirected to this organelle (Fig. 2f), 
indicating that Osh6/Osh7 can directly transport PS between cellular 
membranes in vivo. Together, these results provide the first direct 
evidence for non-vesicular transfer of PS from its site of biosynthesis 
(the ER) to its site of biological activity (the PM). We propose that 
members of the Osh protein family are integral parts of the cellular 
machinery regulating PS homeostasis and signalling and are respon- 
sible for its accumulation at the PM. 

To gain insights into the mechanisms for specific PS recognition, we 
determined the structure of the Osh6-PS complex by X-ray crystal- 
lography at 1.95 A resolution (Supplementary Fig. 9a and Supplemen- 
tary Table 1). Similar to Kes1'*”° (20% sequence identity to Osh6), the 
Osh6 structure consists of an incomplete B-barrel that forms the 
ligand-binding tunnel (residues 74-327), an amino-terminal ‘lid’ (resi- 
dues 35-73) covering the tunnel entrance and a carboxy-terminal 
mainly o-helical region that does not contact the ligand (residue 
328-434) (Supplementary Fig. 9a). The structure also revealed a clear 
electron density map in the ligand-binding pocket of Osh6 in which we 
could model PS with a 17 carbon-long saturated acyl chain (sn-1 
position) and an 18 carbon-long unsaturated acyl chain (oleic acid; 
sn-2 position). The Osh6é tunnel, which can accommodate the long 
aliphatic chain of PS, is predominantly hydrophobic and is about 
8-10A deeper than the one reported for Kesl (Supplementary 
Fig. 9b)’*°. The head group and the sn-2 acyl group of PS are oriented 
towards the tunnel entrance and the sn-1 acyl group towards the 
bottom of the tunnel (Supplementary Fig. 9a). The entire PS molecule 
is involved in extensive interactions with 36 residues from Osh6 
(Fig. 3a). In particular, the «1-B1 loop in the lid and the region near 
the tunnel entrance are crucial for PS recognition (see below). This 
implies a series of hydrophobic residues in the «1-B1 loop (Leu 64, 
Tle 67, Leu 69 and Ile 73) and at the tunnel entrance (Val 124) that form 
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Figure 2 | In vivo, Osh6/Osh7 transport PS from the ER to the PM. 

a, Lipidomics analyses (glycerophospholipid, PL) of PM or ER fractions (top), 
and of total cell extracts (bottom). b, Osh6/Osh7 are required for the 
accumulation of PS to the PM. Rtnl-mCherry marks the ER. c, Schematic view 
of the role of Osh6/Osh7 in PS trafficking. CDP-DAG, cytidine diphosphate- 
diacylglycerol. d, Kinetics of Osh6-mediated PS transfer in vivo. Lact C2A-GFP 
re-localization after the addition of 18:1 lyso-phosphatidylserine (Lyso-PS) in 


extensive non-polar contacts with the sn-2 acyl group of PS (Fig. 3b). 
Several residues in the %1-B1 loop (Leu 64, Ile 67 and Leu 69) and at the 
tunnel entrance (Lys 126, Asn 129 and Ser 183) are also engaged in 
polar interactions with the PS head group, especially with the carboxy- 
late anion. This could explain the specificity of Osh6 for PS versus the 
related PE and PC, which are products of PS decarboxylation 
(Supplementary Fig. 10). 

Only nine of the PS-interacting residues in Osh6 are conserved in 
Kesl. Not surprisingly, Osh6 in which these conserved interacting 
residues are mutated—including Lys126Ala, Asn129Ala (polar con- 
tacts with the PS carboxylate anion) or Ile73Asp (hydrophobic contact 
with PS)—binds very poorly to PS in vitro (Fig. 3c), indicating that they 
have conserved roles in ligand binding. By contrast, mutations of 
conserved, polar amino acids located near the PS head group, but 
not interacting with it (His 157, His 158, Lys351 and Lys 182), had 
no effect. Importantly, most Osh6 residues that bind to PS are not 
conserved in Kes1 and probably give rise to Osh6 specificity. In par- 
ticular, the 1-81 loop in the lid region, which is poorly conserved in 
Kes1 (where it partially folds as a short 3/10 helix'*’), makes extensive 
polar and non-polar contacts to PS (Fig. 3b). Osh6 in which Leu 69 is 
substituted for a charged residue (Asp) or for the corresponding Kes1 
residue (Ala) interacted poorly with PS, whereas substitution for 
another hydrophobic residue (Phe) had no effect (Fig. 3c). Further 
supporting the notion that the «1-B1 loop has a key role in specific 
PS binding, additional mutations of Thr 71 (to Pro) or the substitu- 
tion of the entire «1-81 loop with the corresponding loop of Kes1 
completely abrogated the association with PS. In total, we replaced 
six Osh6-specific, PS-contacting residues (Leu69, Arg 82, Gln 86, 
Val 124, Ser 183 and Met 194) with the corresponding ones in Kes1 
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and for five of these, this led to significant decreases in PS binding 
(Fig. 3c). The crystal structure of Osh6 demonstrates that binding to 
PS involves a series of amino acids that are not conserved in Kes] 
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Figure 3 | Structure of the Osh6-PS complex. a, An overview of Osh6-PS 
interactions. Cx atoms (spheres) of polar (green), positively charged (blue) and 
hydrophobic (black) residues that form contacts to PS (sticks). Underlined, 
mutated residues. Grey, mutated residues not forming contact. Light grey 
isomesh map: PS 2F, — F, electron density map at the 1.0c level. Stars, residues 
conserved in Kes1. b, Close-up of region forming key contacts with PS (sticks). 
Blue, «1-B1 loop. Dashed lines, hydrogen bonds <3.5 A. c, In vitro PS-binding 
activities of Osh6 mutants. Kes1-like mutants introduce corresponding Kes1 
amino acids. The ‘loop-swap’ (L.S.) mutant: Osh6 «1-f1 loop was swapped for 
the corresponding segment in Kes1. Error bars represent standard deviation, 
n23. 
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Figure 4 | Osh6/Osh7 are the first representatives of a PS-binding OSBP 
subfamily conserved in humans. a, A phylogenetic analysis of all OSBP 
domains. Black, S. cerevisiae OSBPs; magenta, human OSBPs. Bar graphs, 
sequence identities derived from pair-wise alignment with Osh6-OSBP (blue) 
and Kes1-OSBP (grey). b, Sequence features of the Osh6 «1-f1 loop are 
conserved in the five clustered human OSBP domains. Blue and yellow: 


and may represent the signature of a new subfamily of OSBPs with 
specificity for PS. 

Our results show that in yeast, Osh6 and Osh7 define a unique 
OSBP subfamily with structural features adapted for PS recognition. 
To determine whether this subfamily is conserved in other kingdoms, 
we performed a phylogenetic analysis with all OSBP domains entered 
in the Pfam database”® (see Methods). The seven OSBP domains of 
Osh proteins in S. cerevisiae distribute broadly across the entire tree 
(Fig. 4a). Those of Osh6/Osh7 and Kes1 belong to two clades, consist- 
ent with the view that they define distinct subfamilies. Importantly, the 
Osh6/Osh7 clade also contains OSBP domains in metazoans, inclu- 
ding five human proteins, ORP5, ORP8, ORP9, ORP10 and ORP11, 
suggesting a broad conservation across eukaryotes. In particular, we 
observed a high degree of conservation of the sequence signature that is 
required for Osh6 to bind specifically to PS (see above)(Fig. 4b). 
Within the «1-81 loop, we could define sequence motifs characteristic 
of the Osh6/Osh7 and Kes1 subfamilies (Supplementary Fig. 11). We 
selected two human OSBP domains that clustered within the Osh6/ 
Osh7 subfamily, one from ORP5 and one from ORP10, and measured 
their binding specificity in vitro, using liposomes derived from brain 
total lipid extract. We observed that the recombinant OSBP domains 
from ORP5 and ORP1O0 efficiently and specifically extracted PS 
(Fig. 4c). This supports the notion that Osh6 and Osh7 are the first 
representatives of a broader subfamily of OSBPs that transport PS and 
which are also conserved in humans. 

Other members of the OSBP family are known to bind sterols and PI 
lipids’. In humans, many are involved in diseases*’”* and in yeast they 
share a common, essential function, as only one of the seven proteins is 
required for yeast survival’*.We demonstrate that PS transport repre- 
sents a new function for some members of this protein family, but 
clearly, binding to a specific lipid is not a common and essential task. 
De facto, the name ‘oxysterol-binding protein’ is a misnomer that 
should be revised and we propose that ‘lipid sensing and sorting’ 
(Lss) protein could be more appropriate. Several models could account 
for the Osh6/Osh7-mediated PS transport against an apparent con- 
centration gradient. A likely scenario relies on the fact that PS associ- 
ates with specific PM microdomains’”™*. This may contribute to its 
sequestration from the pool of ‘free’ PS. Consistent with this hypo- 
thesis, we observed that Osh6 and PS (Lact C2-GFP) partition in 
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Osh6- and Kes1-specific residues, respectively. Green, residues common to 
both. c, The OSBP domains of ORP5 and ORP10 bind PS in vitro. OSBP-bound 
fractions of individual lipids are quantified. Bottom: the PS:OSBP molar ratios 
for recombinant Osh6, ORP5-OSBP and ORP10-OSBP. Chl, cholesterol. Error 
bars represent standard deviation, n = 3. 


spatially discrete domains with distinct dynamic properties (Sup- 
plementary Fig. 7a). In addition, in the ER, the spatial confinement 
of PS-synthesizing enzymes, for example, Chol, at PM contact sites” 
may produce high, local PS concentrations that also contribute to the 
transfer of PS from ER to the PM. An important outcome of our study 
is the ease by which in vivo-assembled LTP-lipid complexes can be 
retrieved from cells. This work clearly demonstrates the feasibility of 
broader efforts that would include all LTPs in higher eukaryotes***°. 


METHODS SUMMARY 

Purification of LTP-TAP fusions and lipid analysis by HPTLC. The TAP 
fusions were affinity purified on IgG agarose resins and eluted with 20 jig tobacco 
etch virus protease (TEV) at 4°C. The eluates (100 ul) were loaded on a gel 
filtration column and 150 pl fractions were collected. Lipids were extracted from 
samples by sequential addition of solvents. The lipid fraction was sprayed on 
HPTLC silica gel 60 plates that were developed with solvent systems for neutral 
lipids and phospholipids. The plates were dripped in 10% (w/v) CuSO4 in 8% (v/v) 
aqueous phosphoric acid, charred at 145 °C for 4.5 min, and scanned for fluores- 
cence detection at 488 nm extinction and 530 nm emission wavelengths. 
Crystallization of Osh6-PS complex. The purified Osh6 was treated with tenfold 
molar excess of PS. The Osh6-PS complexes were purified with Ni-NTA resins 
followed by gel filtration, and crystallized using vapour diffusion techniques. 
Diffraction data were collected at beamline ID23-1 of the European Synchro- 
tron Radiation Facility (ESRF, Grenoble, France). The structure was solved by 
molecular replacement. 

The protocols for the production of recombinant proteins, the in vitro binding 
and transfer assays, the cell fractionation, lipidomics, the visualization of intracel- 
lular PS and the phylogenetic analysis of OSBP domains are described in the 
Methods section. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Purification of LTP-TAP fusions. Yeast TAP strains* were grown in 181 YPAD 
medium to Dgo9 = 3.5 (OD¢o0 = 3.5). Yeast pellets were resuspended in one 
volume of yeast lysis buffer (50mM Tris.HCl pH 7.5, 500mM NaCl, 1.5mM 
MgCl, protease inhibitor cocktail (Complete, EDTA-free; Roche), 1.0mM 
AEBSF, and 0.5mM dithiothreitol (DTT)) and lysed using a planet mill and 
0.5-mm zirconia/silica beads (BioSpec Products). The lysates were centrifuged 
at 27,000g for 20 min and at 200,000g for 1h. The supernatants were clarified 
using 0.22-j1m Express PLUS filters (Millipore), and loaded (at ~1 ml min!) on 
100ul IgG agarose resins (GE Healthcare) packed in Mobicols columns 
(MoBiTec). The resins were washed with 10 ml yeast lysis buffer without protease 
inhibitors, and incubated with 20 1g tobacco etch virus protease (TEV) for 1h at 
4°C for elution. The eluates (100 pil) were loaded on an analytical Superdex 200 
SEC column (3.9 ml column volume) equilibrated with 50 mM TrisHCl pH7.5, 
500 mM NaCl, and 150 ul fractions were collected. 

Lipid analysis by HPTLC. All solvents for HPTLC were from Merck Chemicals 
(HPLC grade). All lipids were purchased from Avanti Polar Lipids unless other- 
wise stated. Lipids were extracted from samples by sequential addition of 3.75 
volume chloroform:methanol (1:2 v/v), 1.25 volume chloroform, and 1.25 volume 
0.5% acetic acid in 500 mM NaCl followed by 30s vortexing after each step. After 
centrifugation at 1,200 r.p.m. for 10 min, the bottom layer was sprayed on HPTLC 
silica gel 60 plates (10 cm X 10 cm)(Merck) in 3-mm bands using the TLC sampler 
4 (Camag). The plates were developed with a solvent system of hexanes:diethy- 
lether:acetic acid 80:20:2 for the analysis of neutral lipids. For phospholipids, the 
plates were sequentially developed with (1) dichloromethane:ethyl acetate:acetone 
80:16:4 and (2) chloroform:acetone:isopropanol:ethyl acetate:ethanol: methanol: 
H,Oiacetic acid 30:6:6:6:16:28:6:2*'. The plates were dried in vacuum and dripped 
in 10% (w/v) CuSOx in 8% (v/v) aqueous phosphoric acid, charred at 145 °C for 
4.5 min, and scanned for fluorescence detection® using Pharos FX Plus molecular 
imager (Bio-Rad)(488 nm extinction and 530 nm emission wave lengths). Lipids 
applied as standards were detected with a sensitivity of <10 picomole. HPTLC 
images were analysed and processed using ImageJ. 

Assessment of protein-lipid interactions from SEC elusion profiles. Only 
lipids co-eluting with LTPs were considered to represent specific ligands. Abun- 
dant yeast lipids (PC, PI, PE and ergosterol) that eluted near the void volume peak 
were filtered out as background. All interactions were detected in at least two 
independent experiments, except the binding of Sfh5 (systematic name Yjl145w) 
to PI which was only detected in one of the duplicates. 

Phospholipid analysis by LC/MS/MS. Phospholipids in the SEC peak fractions of 
Osh6-TAP (fractions 9-10), Kes1-TAP (fractions 9-10) and Secl4—TAP (frac- 
tions 11-12) were analysed using LC/MS/MS by Avanti Polar Lipids. The two peak 
fractions for each TAP-fusion were pooled, and 200 ull were subjected to lipid 
extraction. The samples were mixed with one volume each of methanol and 
chloroform (HPLC grade) and vortexed for 30s. The mixture was centrifuged at 
1,000 r.p.m. for 5 min, and the bottom layer was transferred to a new test tube. The 
upper layer was subjected to re-extraction using 200 ul chloroform. The two 
bottom layers were combined and washed twice with 500 kl water (HPLC grade). 
Appropriate and known quantities of phospholipids were added to the washed 
bottom layer as internal standards. The samples were dried to a residue under 
nitrogen at room temperature. Lipids were dissolved in 200 pl methanol. The 
samples were analysed for PC, PE, PI, PA, PS and PG on a Waters Acquity 
UPLC/AB Sciex 5500 LC/MS system. Each phospholipid group was assayed by 
individual reverse phase chromatography/multiple reaction monitoring methods 
(MRM) and quantified against the respective internal standard compounds. 
Identified compounds exhibiting >3:1 signal to noise response were quantified. 
Production of recombinant proteins in E. coli. Codon-optimized synthetic 
genes for Osh6, Kes1, ORP5-OSBP (Uniprot entry code Q9HOX9, residues 357-— 
788), and ORP10-OSBP (Uniprot entry code Q9BXB5, residues 340-764) (Gen- 
Script, Entelechon) were inserted into the pETM11-SUMO3GFP vector using 
BamHI and NotI sites. Expression vectors for Osh6 mutants were constructed 
on the pETM11-SUMO3GFP vector with Osh6 insertion using QuikChange 
Lightning Site-Directed Mutagenesis Kit (Agilent Technologies). The N-terminal 
hexahistidine-SUMO3 tagged proteins were expressed in BL21 Star cells (Invi- 
trogen) grown in LB medium by induction (at Dgoo = 0.6) with 0.4 mM isopropyl- 
1-thio-B-b-galactopyranoside at 22 °C for 5 h (Osh6 and Oshé mutants), at 18 °C 
overnight or at 28 °C for 3 h (ORP5-OSBP and ORP10-OSBP). Collected cells were 
suspended in the lysis buffer (50 mM Tris pH 7.5, 500 mM NaCl, 20 mM imida- 
zole, 0.5mM DTT, protease inhibitor cocktail (Complete, EDTA-free; Roche)) 
and lysed by sonication. The expressed proteins in the soluble fractions were 
captured on Ni-NTA agarose resins (Qiagen) and eluted in 50 mM Tris pH7.5, 
250mM NaCl, 300mM imidazole. After tag cleavage with 1:200 molar ratio of 
hexahistidine-tagged SenP2 protease (EMBL) and dialysis against 50 mM Tris 
pH7.5, 250mM NaCl, 20-80 mM imidazole, the proteins were loaded on the 


Ni-NTA agarose resins. The flow-through was loaded on the Superdex 200 16/60 
SEC column (GE Healthcare) equilibrated with 50 mM HEPES pH7.5, 250mM 
NaCl. For the preparation of crystallization sample, Oshé was further purified on 
the Mono S 5/50 GL cation exchange chromatography column (GE Healthcare). 
Liposome preparation. PC and indicated mol% of additional lipids were mixed 
and dried into thin films in glass-vials using argon-flow and then under vacuum 
for at least 30 min. Liposomes were formed by rehydrating the lipid films in the 
rehydration buffer (10 mM HEPES pH 7.4, 250mM NaCl unless otherwise spe- 
cified) to the total lipid concentration of 3.8 mM at 62 °C for 1h. Liposomes were 
filtered 21 times through the Nuclepore Track-Etched Membranes (Waters) on 
Mini-extruder (Avanti Polar Lipids) to define the sizes (when specified). Liposomes 
of 50% (w/w) yeast total lipid extract (Avanti Polar Lipids) or 50% (w/w) porcine 
brain total lipid extract (Avanti Polar Lipids) were prepared to 3.0mg ml ' total 
lipid concentration and filtered to 400 nm. 

Lipid-profiling of recombinant Osh6. Recombinant Oshé was incubated over- 
night at 25°C with 0.2% (v/v) Nonidet P40 Substitute (Fluka) or liposomes solely 
consisting of PS at tenfold molar excess, and re-purified using the affinity of 
untagged Osh6 to Ni-NTA agarose resins. The treated and untreated recombinant 
Osh6 (250 jig) was loaded on the analytical Superdex 200 SEC column, and the 
peak fractions were analysed for lipid contents using HPTLC. 

In vitro lipid-binding assay. Recombinant Osh6 (20 |tM) were incubated with 
liposomes containing 10 mol% (0.20 mM) PS, PE, PG, PI, CA or Erg in 200 pilassay 
buffer (10 mM HEPES pH 7.4, 250 mM NaCl) at 25 °C for 30 min. OSBP proteins 
were incubated with liposomes of 50% (w/w) yeast total lipid extract or 50% (w/w) 
porcine brain total lipid extract (total lipid concentration of 0.15 mg ml '). After 
centrifugation at 80,000 r.p.m. for 30 min (TLA 100 rotor) to pellet the liposomes, 
the supernatant (110 ll) was analysed by HPTLC (lipids) and SDS-PAGE (pro- 
teins). Lipids were densitometrically quantified relatively to the individual lipids in 
the input liposomes (applied on the same HPTLC plates). Proteins were densito- 
metrically quantified relatively to the protein standards (loaded on the same 
Coomassie stained gels). Standard deviations are derived from assays performed 
at least in triplicates. 

Biochemical lipid-transfer assay. The ‘heavy donor liposomes (400 nm) were 
prepared with PC and 10 mol% of PS, PE, PG or Erg, in the rehydration buffer 
containing 0.75 M sucrose, and pelleted at 16,100g for 15 min and washed twice 
with the rehydration buffer. The ‘light’ acceptor liposomes (100 nm) were pre- 
pared with PC. Recombinant Osh6 (1.0 1M) was incubated with the donor and 
acceptor liposomes (each corresponding to 2.0mM total lipids) in 150 pil assay 
buffer (10 mM HEPES pH 7.4, 250 mM NaCl) at 25 °C for 30 min. After pelleting 
the donor liposomes at 16,100g for 30 min, lipids in the supernatants (100 1l) were 
analysed (HPTLC). Transferred lipids were densitometrically quantified relatively 
to the individual lipids in the input liposomes loaded on the same HPTLC plates. 
The lipid transfer activities of yeast TAP-fusions were determined using the same 
assay (the donor and acceptor liposomes each correspond to 1.3 mM total lipids). 
The TAP-strains were grown and lysed as described above. The TAP-fusions were 
captured on IgG agarose resins and washed with the lysis buffer (without protease 
inhibitors) with 0.15% (v/v) Nonidet P40 Substitute and eluted with 10 pl TEV in 
~200 pl assay buffer. Fifty microlitre eluates were used for each assay. 
PS-transfer assay using fluorescence microscopy. The ‘heavy’ acceptor lipo- 
somes were prepared of 0.1% BODIPY FL DHPE (Invitrogen), and 1.0% DSPE- 
PEG-Biotin (Avanti Polar Lipids) in the assay buffer (10 mM HEPES pH 7.4, 
300 mM NaCl) with 0.75 M sucrose. After sonication for 30 s, the liposomes were 
pelleted at 16,100g and washed three times in the assay buffer. The ‘light’ donor 
liposomes (100 nm) with 30% PS were prepared in assay buffer. Recombinant 
Osh6 (1.0 1M) was incubated with the donor and acceptor liposomes (corres- 
ponding to 2.0mM and 1.0 mM total lipids, respectively) in 200 pl assay buffer 
at 25°C for 3h. The acceptor liposomes were then pelleted at 16,100g for 15 min 
and washed three times in 10 mM HEPES pH 7.4, 150 mM NaCl, and immobilized 
on glass slides coated with biotin-labelled bovine albumin (Sigma) and streptavi- 
din (Sigma). The immobilized acceptor liposomes were incubated in 0.5 1g ml! 
solution of annexin V—Cy3.18 (Sigma) in 10 mM HEPES pH7.5, 140 mM NaCl, 
2.5mM CaCl) for at least 10 min and imaged on the Olympus IX81 microscope 
equipped with a X100/NA 1.45 objective lens and Hamamatsu Orca-ER camera. 
Creation of yeast strains. Yeast strains with deleted or depleted lipid metabol- 
izing, OSH6, OSH7 and KES1 genes, or C-terminally mCherry-tagged RTN1 and 
VPH1 were created in the SGA Y7039 strain (a gift from C. Boone) derived from 
the BY4741 background**(MAT« can1A:: STE2pr-LEU2 lyp1A ura3A0 leu2A0 
his3A1 met15A0) using standard yeast molecular biology procedures”. Strains 
used for rapamycin-inducible heterodimerization*’ carried Fpr1A and Ser1972Arg 
mutation on the TORI gene (MATa his3200 leu2-3 112 ura3-52 lys2-801 torl-1 
fprl::). For the overexpression of Osh6 and Osh6 mutants, the cloned Osh6 gene 
was inserted into the plasmid pRS315. The QuikChange Lightning Site-Directed 
Mutagenesis Kit (Agilent Technologies) was used to introduce mutations. Osh6 
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and its mutants were expressed with C-terminal mCherry-tag from the ADH1 
promotor. The ER-markers GFP- and mCherry-HDEL were constructed as 
previously described*® into the plasmid pRS315 and expressed from the ADH1 
promotor. Lact C2-GFP and PLCS-PH-GFP were expressed from plasmids p416- 
GFP-Lact-C2' (Haematologic technologies) and pRS426-PLC5-PH” (a gift from 
S. Emr), respectively. 

Phospholipid analysis and subcellular fractionations. All experiments were 
performed in triplicates. To ensure reproducible determination of phospholipids, 
yeast strains were grown in litter scales and lysed mechanically with a planet mill 
and 0.5-mm zirconia/silica beads (BioSpec Products) (>90% lysis efficiency). 
Strains were grown in 1.01 synthetic defined (SD) medium to Dgoo of ~1.5. The 
pellets were suspended in 10 ml 25 mM imidazole pH 7.0, 0.4 M sucrose (supple- 
mented with protease inhibitor cocktail (Complete, EDTA-free; Roche), 1.0 mM 
AEBSF). After lysis, the extracts (100 pl) were subjected for phospholipid extrac- 
tion and profiling on HPTLC as described above. Ethanolamine or choline was 
added to the growth medium to 1.0 mM as indicated. 

For fractionation of PM and microsomes, strains were grown and lysed as 
described above. The lysates were subjected to differential centrifugation and suc- 
rose step gradient centrifugation as described previously**’. The mouse anti- 
DMP1 yeast monoclonal antibody (Invitrogen, catalogue no. A-6429) and goat 
anti-Pmal polyclonal antibody (Santa Cruz Biotechnology; catalogue no. sc-19389) 
were used to assess the enrichment of membrane fractions with western blot. 

For determination of PS produced in cho1A strains from exogenously added 
lyso-PS, strains were grown to Dgoo of ~1 in 1.0ml SD medium supplemented 
with 1.0mM ethanolamine. Lyso-PS was dried under vacuum to remove the 
solvent and suspended in the SD medium to 54 1M. 1.0 ml of the yeast culture 
was mixed with same volume of lyso-PS solution and incubated for 10 min at room 
temperature. The cells were then cooled on ice, pelleted and washed twice in ice- 
cold water, resuspended in 200 il water, and lysed mechanically (vortexing) using 
0.5-mm zirconia/silica beads. PS levels were determined (in 100 ul lysate) as 
described above. 

Visualization of intracellular PS. Yeast strains were grown overnight at 30 °C in 
relevant SD medium (plasmid selection) and diluted to Déoo < 0.05 for further 
growth for ~5h. The cells were adhered on glass slides coated with Concana- 
valin A (Sigma) and imaged with an Olympus IX81 microscope equipped with a 
X100/NA 1.45 objective lens and Hamamatsu Orca-ER camera. Total internal 
reflection fluorescence microscopy was performed on 488 nm and 561 nm solid- 
state lasers (Coherent). For Osh6-translocation (rapamycin-inducible hetero- 
dimerization), the adhered cells were incubated for 30min at 30°C in the SD 
medium containing 1.41.M rapamycin. To monitor cellular localizations of PS 
produced from exogenously added lyso-PS in the cholA strains, lyso-PS was 
dissolved in SD medium as described above, and the resultant solutions were 
centrifuged for 2 min to remove insoluble materials. Equal volumes of Lyso-PS 
solution were added to cultures of the adhered cells (performed at the room 
temperature or on ice). Images were processed with ImageJ. 

Visualization of Osh localizations upon perturbed lipid metabolism. Yeast 
strains harbouring both GFP-fused OSH genes and deletions/depletions of genes 
of lipid metabolic enzymes were generated using robot facilitated mating, sporu- 
lation and strain selection of the standard SGA protocols**“°. The genotypes of the 
final strains were confirmed by PCR. For live-cell imaging, cells were inoculated in 
SD medium without tryptophan and histidine and grown overnight at 30 °C. Cells 
were diluted to Dgo9 = 0.1, adhered on Concanavalin A (Sigma)-coated 96 well 
glass bottom plates, and imaged on fully motorized Olympus fluorescence micro- 
scope system (Olympus IX81) at 30°C (temperature controlled incubator, 
EMBL manufacture). Images with 16-bit readout were acquired using a X100 
oil objective with an NA of 1.45, low noise highly sensitive ORCA-R camera 
(Hamamastu), MT20 illumination system, and Uniblitz Electro-Programmable 
Shutter system. Acquired images were processed with Image]. 

Phylogenetic analysis of OSBP domains. A sequence alignment file containing 
all OSBP domain entries were obtained from the Pfam database (http://pfam. 
sanger.ac.uk/), and processed in Jalview*' to reduce redundancy, eliminate entries 
shorter than 320 amino acids or containing unassigned amino acids. A phylogen- 
etic analysis was performed on the processed sequence alignment at the PHYLIP 
3.67: protdist server**(http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::protdist). 
The generated tree was visualized in iTOL*(http://itol.embl.de/upload.cgi). 
Sequence logos were created in WebLogo“*(http://weblogo.berkeley.edu/) after 
the realignment of protein groups using T-coffee**(http://www.ebi.ac.uk/Tools/ 
msa/tcoffee/). 

Crystallization of Osh6-PS complex. The purified Osh6 (12mgml~') was 
treated with tenfold molar excess of PS in liposomes at 25°C overnight in 
20 mM HEPES pH 7.4, 500 mM NaCl, and 0.5 mM DTT. Oshé6 was purified using 
Ni-NTA resins, and Superdex 200 SEC column equilibrated with 10 mM HEPES 
pH 7.4, 250 mM NaCl. 
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Screening (mosquito crystallization robot, TTP Labtech) using the sitting drop 

vapour diffusion technique in 96 well plate format gave initial microcrystals of 
Osh6-PS with drop and reservoir volumes of 0.2 jl and 100 ul, respectively. Later 
crystals were optimized to larger size in 24 well plates (Hampton Research) at 
15°C by the hanging drop vapour diffusion technique. Equal volumes (0.75 ,1l/ 
0.75 ul) of protein (13 mg ml — ') and crystallization buffer (0.1 M MES, 13% (w/v) 
PEG6000, 5% (w/v) MPD, pH6.5) were mixed on a coverslip, which was subse- 
quently equilibrated against a reservoir containing 500 ul of crystallization buffer. 
Few microcrystals appeared in ~3 days. At this stage 1 pil of reservoir was added 
into the drop. This delayed the crystal growth and gave optimal monoclinic 
crystals grew in about eight days to a size of about 80 jim X 70 [tm X 50 um. 
They were harvested by transfer to a cryoprotectant solution (15% ethylene glycol 
in the mother-liquor), rapidly recovered in a nylon loop (Hampton Research) and 
flash-frozen in liquid nitrogen. 
Data collection, crystal characterization and refinement. Data were collected at 
beamline ID23-1 of the European Synchrotron Radiation Facility (ESRF, Gre- 
noble, France) using ADSC Quantum Q105 CCD detector, at a wavelength of 
0.93340 A (Supplementary Table 1). Crystals were kept at a temperature of 100K 
in a stream of nitrogen gas for data collection. The diffraction data clearly sug- 
gested for space group 5. A Matthews coefficient of 2.42% A* Da ' and a sol- 
vent content of 49% were obtained, assuming two molecules were present in the 
asymmetric unit. The data were processed and scaled using the XDS program 
package*’. The phase problem was solved by molecular replacement” using 
PHASER* with the individual structure of Kesl1 (PDB code 1ZHT) as search 
model (20% sequence identity to Osh6). Flexible loops (60-76, 124-133, 292- 
311, 367-381 and 390-416) of the search model were truncated for the phase 
solution. The cycles of adjustments to the electron density were done using gra- 
phical program COOT”. 

The alternate rounds of computational-refinement were performed using 
PHENIX”. The first electron density map already showed interpretable density 
for the major fold of the Osh6 molecule. Most of the amino acid residues were 
clearly visible in the electron density map, except side chains of few residues at the 
N-terminal region and B11-B12 which were disordered. Therefore, side-chain 
occupancies of these atoms were kept at zero. Water molecules were modelled 
using COOT” and included in the refinement cycles at the later stage. Manual 
fitting of the model, following the refinement cycles eventually lead to final Rwork of 
19.7% and Reree of 23.5% with a good stereochemistry (Supplementary Table 1). In 
the final model (residues 36-434 (molecule A) and 35-434 (molecule B)), no 
residue fall in disallowed regions of Ramachandran space (95.2% preferred, 
100% allowed)(using PROCHECK*'). Molecular diagrams were drawn using 
PyMOL (http://pymol.sourceforge.net). Structure-based sequence alignment was 
produced using PROMALS3D” and corrected manually on the basis of the three- 
dimensional structures. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/naturel2498 


Corrigendum: A CRISPR/Cas 
system mediates bacterial innate 


immune evasion and virulence 


Timothy R. Sampson, Sunil D. Saroj, Anna C. Llewellyn, 
Yih-Ling Tzeng & David S. Weiss 


Nature 497, 254-257 (2013); doi:10.1038/nature12048 


In this Letter, we described two small RNAs (scaRNA and tracrRNA) 
within the Francisella novicida CRISPR/Cas locus that are necessary 
for repression of an endogenous transcript. Concurrent to our studies, 
E. Charpentier’s group performed RNA sequencing analyses of mul- 
tiple type II CRISPR loci to identify tracrRNA in F. novicida and other 
species’, based in part on the co-processing of tracrRNA:crRNA by 
RNase III (ref. 2). These observations indicate that the regulatory 
RNA we annotated as scaRNA is the tracrRNA, and that the RNA 
we annotated as tracrRNA is the scaRNA. Furthermore, RNAseq data 
show that the transcriptional direction of the crRNA array is the oppo- 
site of what was predicted. Finally, our predictions of tracrRNA and 
scaRNA within Supplementary Table 2 are incorrect based on this 
transcriptional analysis; the scaRNAs predicted in Neisseria meningi- 
tidis 92045, Listeria monocytogenes SLCC2482 and Streptococcus 
pyogenes M1 GAS are in fact within the crRNA array transcript, as is 
the predicted tracrRNA in Campylobacter jejuni NCTC11168, whereas 
the predicted C. jejuni scaRNA is actually the tracrRNA. We include a 
corrected annotated Fig. 1a below, and apologize for any confusion 
about our incorrect nomenclature. We are grateful to E. Charpentier 
and her group for alerting us to the errors. 


1. Chylinski, K., Le Rhun, A. & Charpentier, E. The tracrRNA and Cas9 families of type 
Il CRISPR-Cas immunity systems. RNA Biol. 10, 726-737 (2013). 

2. Deltcheva, E. et a/. CRISPR RNA maturation by trans-encoded small RNA and host 
factor RNase Ill. Nature 471, 602-607 (2011). 


crRNA array 
cas9 cas1 cas2 cas4 : | scaRNA | 
tracrRNA 


810,052 bp 818,940 bp 


Figure 1 | This is the corrected Fig. 1a of the original Letter. 
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CORRIGENDUM 
doi:10.1038/naturel2527 


Corrigendum: X-ray analysis on 
the nanogram to microgram scale 


using porous complexes 


Yasuhide Inokuma, Shota Yoshioka, Junko Ariyoshi, 
Tatsuhiko Arai, Yuki Hitora, Kentaro Takada, 
Shigeki Matsunaga, Kari Rissanen & Makoto Fujita 


Nature 495, 461-466 (2013); doi:10.1038/nature11990 


Previously unnoticed ambiguities in the crystallographic data in our 
Article (specifically, non-negligible disorder of the miyakosyne A 
molecule), along with further work by three of the authors (Y.H., 
KT. and S.M.) have revealed that the stereochemistry we assigned 
at C14 of miyakosyne A is incorrect. Our new investigations confirm 
that we can indeed determine the molecular skeleton of miyakosyne 
A. However, we can only tentatively and not unambiguously identify 
all of the stereochemistry of miyakosyne A based on the data included 
in the original paper. A future publication will confirm the stereo- 
chemistry at the C14 moiety of miyakosyne A. The other conclusions 
of our paper are not affected by this correction. 


RETRACTION 
doi:10.1038/nature12497 


Retraction: Bird-like fossil 


footprints from the Late Triassic 
Ricardo N. Melchor, Silvina de Valais & Jorge F. Genise 


Nature 417, 936-938 (2002); doi:10.1038/nature00818 


In this Letter, we considered the bird-like footprints from the former 
Santo Domingo Formation of northwest Argentina to be of Late 
Triassic age. Recent radiometric dating’ of the sedimentary sequence 
containing these bird-like footprints (renamed as the Laguna Brava 
Formation) indicated a Late Eocene age. Further geological studies’ 
suggest that the region suffered a complex deformation during the 
Andean orogeny, including block rotation. In consequence, our previ- 
ous inferences about the possible implications of this finding for the 
fossil record of Aves are no longer supported. This Retraction has not 
been signed by J.F.G. Correspondence should be addressed to R.N.M. 
(rmelchor@exactas.unlpam.edu.ar). 


1. Melchor, R. N., Buchwaldt, R. & Bowring, S.A. A Late Eocene date for Late Triassic 
bird tracks. Nature 495, E1-E2 (2013). 

2.  Vizan, H. et al. Geological setting and paleomagnetism of the Eocene red beds of 
Laguna Brava Formation (Quebrada Santo Domingo, northwestern Argentina). 
Tectonophysics 583, 105-123 (2013). 
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WATCHARA/SHUTTERSTOCK 


TECHNOLOGY FEATURE 


THE GENOME 


JIGSAW 


Advances in high-throughput sequencing are accelerating 
genomics research, but crucial gaps in data remain. 
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BY VIVIEN MARX 


r | Yo understand why high-throughput 
gene-sequencing technology often 
produces frustrating results, says Titus 

Brown, imagine that 1,000 copies of Charles 

Dickens’ novel A Tale of Two Cities have been 

shredded in a woodchipper. “Your job is to put 

them back together into a single book,” he says. 
That task is relatively easy if the volumes are 
identical and the shreds are large, says Brown, 

a microbiologist and bioinformatician at 

Michigan State University in East Lansing. It is 

harder with smaller shreds, he says, “because if 

the sentence fragments are too small, then you 
can't uniquely place them in the book”. There 


are too many ways they might fit together. 
“And it’s harder still if the original pile of books 
includes multiple editions,” he says. 

Researchers in genetic sequencing today face 
a similar task. An organism’s DNA — made up 
of four basic building blocks, or bases, denoted 
by the letters A, T, Cand G — is chopped into 
short snippets, sequenced to determine the 
order of its bases and reassembled into what 
researchers hope is a good approximation of 
the organism's actual genome. 

Today’s high-throughput sequencing 
technology is remarkably powerful and has 
led to an explosion of sequencing projects 
in laboratories around the world, says Jay 
Shendure, a molecular biologist who develops 


sequencing methods at the University of 
Washington School of Medicine in Seattle. 
Thousands of patient tumours and more than 
10,000 vertebrate species have been or are 
being sequenced. High-throughput sequenc- 
ing is now an essential tool for basic and clini- 
cal research, with applications ranging from 
detection of microbial ‘bio-threats’ to finding 
better biofuels’. 

But some types of genomic DNA cannot be 
sequenced by high-throughput methods, leav- 
ing many frustrating gaps in data (see ‘What 
makes a tough genome?’). For example, a 
genome might contain long stretches in which 
the sequence simply repeats — as if Dickens 
had filled whole pages with a word or sentence 
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written over and over — making that passage 
hard, if not impossible, to reconstruct by 
the usual technologies. And the widespread 
adoption of next-generation sequencing has 
meant that the quality of genome assemblies 
has declined significantly over the past six 
years, says Evan Eichler, a molecular biologist 
also at the University of Washington. Although 
“we can generate much, much more sequence, 
the short sequence-read data translate into 
more gaps, missing data and more incomplete 
references, he says. 

Incomplete genomes make it harder for 
researchers to identify and interpret sequence 
variations. “Instead,” Eichler says, “we focus 
only on the accessible portions, creating a 
biased view,’ which in turn hinders efforts to 
study the genetic basis of disease or how species 
have evolved. For example, the human-genome 
sequence, used as a reference by scientists 
around the world, has more than 350 gaps, 
says Deanna Church, a genomicist at the US 
National Center for Biotechnology Information. 
An updated reference genome is filling in much 
of the missing data, but “even with the release 
of the new assembly, there will still be gaps and 
regions that aren't well represented,’ she says. “It 
is definitely a work in progress.” 

More than 900 human genes are in regions 
where there is much repetition. About half of 
these genes are in areas so poorly understood 
that they are often excluded from biomedical 
study, says Eichler. Certain regions of chromo- 
somes, notably those 
near centromeres 
(where the two halves 
of a chromosome con- 
nect) and telomeres 
(the ends of chromo- 
somes) are especially 
incomplete in the ref- 
erence genome. 

This lack of infor- 
mation can have 
medical conse- 


quences. For exam- Jay Shendure 
ple, researchershave ' working to 
known for more develop the . 
than a decade that mext generation 
medullary cystic of sequencing 
kidney disease —a methods. 


rare disorder that 
occurs in mid-life — can be caused by muta- 
tions in a gene hidden somewhere along a 
2-million-base-pair stretch of chromosome 1. 
Early detection of the mutation is the first step 
towards preventative therapies, but would 
require a DNA test. The gene, however, lies 
within a region rich in sequence repeats as 
well as in the bases guanine (G) and cytosine 
(C). Such ‘GC-rich’ regions, like repetitions, 
are difficult to sequence. 

Only by reverting to Sanger sequencing — 
a classic but more laborious approach — and 
combining it with special assembly methods 
were researchers able to decipher the DNA 


region involved in the disease. The results, 
which were published in February’, mean that 
a test to screen younger members of families 
affected by the disorder is now a possibility. 

Sanger sequencing is a painstaking process 
in which each type of DNA base is labelled 
with a different compound. The labelled DNA 
is then separated and the sequence is read. 
For the Human Genome Project, researchers 
combined Sanger sequencing with techniques 
to establish markers that locate where the 
sequences fit. The approach, which has been 
in use for decades, delivers a read accuracy and 
contiguity of sequence that are unmatched by 
current technology, Shendure says. “I couldn't 
do anything remotely approaching the qual- 
ity of what resulted from the project.” But the 
art of Sanger sequencing and its associated 
methods cannot be scaled up for the high- 
throughput sequencing projects done today. 
“We need to think about how to ‘next-genera- 
tion-ify all of this,’ he says. 

Research to do just that is well under way, 
with a variety of methodologies that address 
problems such as repetitive sequences and 
GC-rich regions, as well as the knotty task of 
assembling complete genomes for organisms 
that have four or even eight copies of each 
chromosome, for example, as opposed to 
humans’ two. 

Some of the technologies on the horizon 
promise to deliver longer reads and, possibly, 
fewer headaches for researchers trying to 
assemble them. But until those instruments 
are on bench tops, scientists are combining 
new and old approaches to refine sequencing. 


RICH IS POOR 

Some of the newer approaches aim to tackle 
GC-rich regions. For high-throughput 
sequencing, DNA is often first chopped into 
short fragments, which are then amplified 
by polymerase chain reaction (PCR). But 
the enzyme used in PCR “has trouble getting 
through” GC-rich regions, says Shendure. As 
a result, GC-rich stretches can end up poorly 
represented in the DNA sample delivered to 
the sequencer, thus skewing the data. Some 
sequencing technologies, such as those made 
by Illumina, based in San Diego, California, 
use amplification before and during the 
sequencing process, causing further bias 
against GC regions. 

A number of sample-preparation 
approaches reduce this GC bias. The amplifi- 
cation step is cut out completely in platforms 
made by Pacific Biosciences, based in Menlo 
Park, California, and in a method being devel- 
oped at Oxford Nanopore Technologies in 
Oxford, UK. And although DNA read lengths 
differ among platforms, the most widely used 
bench top sequencers — which are made by 
Illumina — generate short reads, of around 
150 base pairs. 

“The killer with short reads is that they're 
very sensitive to repeated content,’ says Brown. 


GENOMICS 


What makes a 
tough genome? 


Certain features of DNA are challenging 
for high-throughput sequencing. 

@ Long sequences of repeated bases 

© Missing bases in the original sequence 
© Degraded or damaged DNA 

@ Regions rich in guanine and cytosine 


If the read length is shorter than a repeat — 
or, to draw on the book analogy, if the shreds 
of the novel are only a fraction as long as a 
repeated paragraph — itis hard or even impos- 
sible to uniquely place. “That’s where things 
like long reads or other technologies can be 
helpful,” says Shendure. Long DNA fragments 
can bridge repetitive regions and thus help to 
map them. As another way to ease assembly, 
researchers in Shendure’s group and else- 
where are exploring different methods to tag 
and group DNA fragments before sequenc- 
ing. “There are more on the horizon,” says 
Shendure, but he prefers to divulge the details 
in research publications. 

The terms ‘short’ and ‘long’ are in a state of 
flux in this fast-moving industry. The first gen- 
eration of Illumina machines generated reads 
of around 25 base pairs in length; the latest 
ones have upped that to around 150 base pairs 
(see ‘Extended sequence’). But it is still hard 
to assemble a complete genome from reads of 
this length. 

Geoff Smith, who directs technology 
development at Illumina in Cambridge, UK, 
acknowledges the drawbacks of short-read 
technology for sequencing repetitive regions 
and various types of genomic rearrangements. 
He says that the company aims to address 
issues that crop up as researchers compare 
genomes they sequence to reference genomes, 
or sequence organisms from scratch without 
references. 

Illumina has launched a service to allow 
longer reads with its current short-read tech- 
nology. Last year the firm bought Moleculo, a 
company based in San Francisco, California, 
which has developed a process to create long 
reads by stitching together short ones through 
a proprietary sample-preparation and com- 
putational process. In July Illumina began 
offering Moleculo’s process as a service for 
customers. 

The Moleculo process first creates DNA 
fragments about 10,000 bases (10 kilobases) 
in length. The fragments are sheared and 
amplified, then grouped and tagged with a 
unique barcode that helps to identify which 
larger fragment they originated from and aids 
in assembly. 
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EXTENDED SEQUENCE 


IIlumina’s first high-throughput sequencers produced very short reads. 


The Genome Analyzer IIx 
generated DNA sequence 
reads ~100 base pairs 

in length 


illumina’ 


At present, sample preparation for the 
Moleculo process takes around two days. 
Smith says that he and his team are refining the 
process and that by the end of the year Illumina 
will launch Moleculo as a stand-alone sample- 
preparation kit. He says that company scien- 
tists are now evaluating the kit’s performance 
by sequencing a well-known genome, but he 
prefers not to say which one. 

“We suspect you will be able to uncover 
a lot more of the genome with 10-kilobase 
reads versus the [150-base-pair] read length 
that we currently have,’ says Smith. He adds 
that the company plans to increase the frag- 
ment length to 20 kilobases. He and his team 
hope to “develop better molecular-biology 
tools to allow us to reach into these difficult- 
to-sequence parts of the genome but also use 
those tools on well-characterized genomes,’ 
he says. The team is also tuning the Illumina 
software to better distinguish between false 
and correct reads. 

The company’s initiative comes at a time 
of intense commercial and academic activity 
around long-read sequencing technology and 
new assembly methods. Finished genomes 
have taken a back seat, leaving many highly 
fragmented assemblies that need completing, 
says Jonas Korlach, chief scientific officer of 
sequencing manufacturer Pacific Biosciences 
in Menlo Park, California, whose sequencer 
generates read lengths of around 5 kilobases. 

Korlach agrees that long reads will help to 
sequence repetitive regions, such as those 
that characterize many plant genomes, for 
example. They will also help with the chal- 
lenge of distinguishing between copies of 
chromosomes, important in identifying the 
tiny variants that can affect biological func- 
tion. Humans are diploid, meaning they have 


two copies of each chromosome, but “many 
organisms, especially plants, have even more 
copies, which makes resolving all the different 
chromosomes so much harder’, Korlach says. 


TOUGH NUTS 

Plant sequencing, in particular, will benefit 
from improvements’. The spruce genome is 
a “real nightmare’, says Stefan Jansson, a plant 
biologist at the Umea Plant Science Centre in 
Sweden. Jansson led a study that generated a 
draft assembly of the Norway spruce genome 
(Picea abies)*. In addition to being the larg- 
est genome yet sequenced, it also contains 
many repeats, and the differences between its 
chromosomes are larger than in the human 
genome. “Sequencing diploid spruce is like 
mixing human and chimpanzee DNA and 
then trying to assemble them simultaneously,’ 
Jansson says. 

Many plants have more than two copies 
of chromosomes. Bread wheat (Triticum 
aestivum), for example, is hexaploid, and 
sequencing and assembling the six sets of 
chromosomes to completion has proven 
extremely difficult. And although some straw- 
berry species are diploid, the commercial 
strawberry (Fragaria x ananassa) is octoploid: 
it has eight sets of seven chromosomes, four 
sets from each parent, says Thomas Davis, 
a plant biologist at the University of New 
Hampshire in Durham. “Good thing Mendel 
didn’t use octoploid strawberries to try to 
understand heredity,’ he says. 

Davis and his colleagues have published a 
draft genome of the diploid woodland straw- 
berry (Fragaria vesca), and now want to apply 
their experience to the octoploid strawberry’. 
Assembling this tough-nut genome will 
require high-quality reads longer than 500 base 
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Reads from the HiSeq 
2500 reach lengths of 
~150 base pairs 


pairs, Davis says. He believes he can succeed, 
although he does not want to share his meth- 
odology just yet. “If anyone cracks that nut, 
he'll do it,” says Kevin Folta, a molecular biolo- 
gist at the University of Florida in Gainesville, 
who led the woodland-strawberry project. 

The plant world has many other challenging 
genomes to offer. The onion genome is mas- 
sive, Folta says, and sugarcane has 12 copies 
of each chromosome. “Those will take special 
techniques,’ he says. 

Every platform has benefits and drawbacks, 
and scientists must weigh the costs, sample- 
preparation time and sequencing-error 
rates for each. To sequence the woodland 
strawberry, for example, the scientists used a 
combination of three platforms. 

But for polyploid genomes, short-read 
sequencing is almost a waste of time, says 
Clive Brown, chief technology officer at 
Oxford Nanopore. “You don’t know where 
your short read comes from, which chromo- 
some it is from,” he says. “It’s very hard to piece 
that together.” He believes that the problem will 
be helped by instruments, including those in 
development in his company, that can generate 
long reads without the need for special sample 
preparation or complex assembly. The longer 
the reads, the easier the assembly, because the 
overlapping sequences will help researchers to 
determine which sequence belongs to which 
chromosome. 

Fresh approaches were needed to crack the 
genome of the oil palm (Elaeis guineensis), 
reported last month in Nature°. The effort 
was more than a decade in the making. Oil 
palm is an important source of food, fuel and 
jobs in southeast Asia, and the industry is 
under pressure to produce it sustainably and 
avoid increased rainforest logging, says study 
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co-leader Ravigadevi Sambanthamurthi, 
director of the advanced biotechnology and 
breeding centre at the Malaysian Palm Oil 
Board in Kajang, which works with the 
country’s oil-palm industry. 

With millions of repeats distributed 
throughout the plant’s genome, short reads 
could fit in many possible spots in the assem- 
bled DNA sequence. “It is as if you were 
assembling a jigsaw puzzle in which most of 
the pieces are identical,” says Robert Martiens- 
sen, a geneticist at Cold Spring Harbor Labora- 
tory in New York, who co-led the project with 
Sambanthamurthi. 

Classic sequencing methods were too 
laborious and expensive for the oil-palm 
project. So Martienssen suggested applying 
a technique based on a finding he had made 
in 1998 — that repeats in plant genomes can 
be distinguished from genes because the cyto- 
sine bases in the repeats usually carry methyl 
groups. Before fragments are sent to the 
sequencer, they are treated with enzymes that 
digest methylated DNA and thereby remove 
the repeats from the samples. 

To complete the oil-palm project, the 
scientists applied this methylation-filtration 
technique and then sequenced the DNA 
regions housing genes. The technique has 
now been commercialized through Orion 
Genomics, a company based in St Louis, 
Missouri, which Martienssen co-founded. 

The researchers used a high-throughput 
sequencer made by 454 Life Sciences, a com- 
pany owned by Roche and based in Branford, 
Connecticut, that generates short reads from 
longer, filtered fragments. In preparing the 
samples, the researchers used bacteria to 
amplify DNA in large chunks on bacterial arti- 
ficial chromosomes — an approach also used 
in the Human Genome Project — to pin down 
hard-to-map regions by retaining them next to 
genes with known positions to act as signposts. 

Assembly of the oil-palm genome called 


for extensive computational resources, which 
crashed multiple times, the researchers say. 
But now, with the genome in hand, they have 
located a gene that encodes the shell of the 
palm fruit, knowledge they hope to harness to 
increase the plant's yield. 

Sambanthamurthi says that when the 
researchers finally pinned down the shell 
gene, they popped a bottle of champagne, then 
celebrated with a traditional Malaysian meal 
served on a banana leaf. 


THE LONG AND THE SHORT 

Bacterial genomes are smaller and less 
complex than those of plants and other multi- 
cellular organisms, but they, too, have regions 
that are tough to sequence. For example, 
Bordetella pertussis, which causes whooping 
cough, has hundreds of insertion sequence 
elements — stretches of mobile DNA inserted 
into various locations in the gnome — each 
more than 1 kilobase long. Proponents of 
long-read technology say that spanning these 
regions with long reads will deliver sequencing 
efficiency gains. 

Korlach points out that it took a team of 
more than 50 scientists to solve the bacterium’s 
complete genome’. But long-read technol- 
ogy can make assembly of highly repetitive 
genomes faster and easier, he says. He says that 
he and scientists in the Netherlands were able 
to assemble nine whooping-cough bacterial 
strains in one month. 

Whether a read is classified as ‘long’ or 
‘short’ is in great flux. Two years ago, scien- 
tists might have said that a long read was 
1 kilobase, Korlach says. “Now [Pacific Bio- 
sciences] customers are generating an average 
of 5,000 bases, with some reads longer than 
20,000 bases — and we are working to deliver 
even more than that.’ Ultimately, a ‘long read’ 
will be as long as is needed to sequence a given 
genome, he says. 

Korlach knows that some scientists say his 


Oil palm has a complex and repeat-riddled genome that took more than ten years to sequence. 
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company’s sequencers are pricey, but he says 
that the newer versions have seen a significant 
drop in price and an increase in throughput. 
He says that the question of price is often raised 
“in the context of pure cost per sequenced 
base”. And, he adds, if a certain sequencing 
technology is the only one that will work to 
solve a medically important question, “then 
there is no price tag 
that can be put on this 
medically relevant 
information”. 

Last year, research- 
ers collaborating with 
Pacific Biosciences 
used the compa- 
ny’s sequencer to 
distinguish the repet- 
itive genomic region 
involved in fragile X 


Titus Brown syndrome, a develop- 
likens high- mental disorder that 
throug: hput is caused by repeats in 
sequencing to a particular region on 
piecing together the X chromosome, 
1,000 shredded and that worsens in 
copies ofanovel. _ severity with higher 


numbers of repeats’. 

As technology developers get closer to 
instruments that produce longer reads, scien- 
tists will need longer DNA fragments at the 
beginning of their sequencing experiments. 
Several companies focus on helping research- 
ers to prepare DNA fragments for sequencing. 
For example, Sage Science, based in Beverly, 
Massachusetts, has a platform that uses pulsed- 
field electrophoresis to select and sort DNA 
fragments of sizes ranging from 50 base pairs 
to 50,000 base pairs. In May, the company 
began marketing its instrument to accompany 
the Pacific Biosciences sequencing platform. 

Steve Siembieda, who is responsible for 
business development at Advanced Analytical 
Technologies in Ames, Iowa, says that his com- 
pany sees the trend towards longer reads as 
writing on the wall. The company has licensed 
patents from Iowa State University, also in 
Ames, to develop an instrument to assess the 
integrity, fragment length and concentration 
of DNA samples. 

With this instrument, an electric field is 
applied to a tiny amount of DNA so that it is 
pulled into along, hair-thin capillary tube con- 
taining a gel with a fluorescent dye that binds 
to DNA molecules. As the DNA fragments 
move through the gel, they separate according 
to size. “Small molecules move fast, big mol- 
ecules move slowly,’ Siembieda says. As the 
molecules pass by a window in the capillary, 
a flash of light excites the dye and a camera 
records the DNA fragment length (see ‘Bits 
and pieces’). 

The instrument’s readout tells scientists 
whether the size distribution of the DNA 
fragments is in the range needed for a given 
sequencing platform and whether the DNA 
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GENOMICS 


BITS AND PIECES 


The Fragment Analyzer, made by Advanced 
Analytical Technologies, tests DNA quality, 
concentration and fragment length before 
sequencing. 


Sample tray 


DNA in a multi-well plate is pulled 
into hair-thin tubes containing a gel 
and a fluorescent dye that binds DNA. 


has the right concentration. Siembieda says 
that skipping these measurements can be the 
wrong experimental shortcut — if the con- 
centration or fragment size is off, “a sequencer 
may run for nine days, it will cost them thou- 
sands of dollars, plus all the time wasted to not 
make sure they have the appropriate material”. 
The instrument will possibly be used in devel- 
oping the Moleculo process, but negotiations 
between the two companies are still under way. 

Technology development at Advanced 
Analytical is focusing increasingly on long DNA 
fragments, which are challenging to resolve, 
Siembieda says. One solution is to customize 
gels for different applications. At present, the 
company’s instrument can resolve lengths of 
up to 20 kilobases and the company is working 
on resolving longer fragments, he says. 


ASSEMBLY REQUIRED 

Scientists are applying many methods and tricks 
to create longer fragments. “Unfortunately, 
these technology tricks create erroneous data 
at points, so now youre stuck with some data 
that may be wrong,” says Michigan State's Titus 
Brown. He was part of an effort, published in 
April’, to sequence the lamprey (Petromyzon 
marinus) genome, one-third of which is cov- 
ered by long repeats. Obtaining an assembly 
even with Sanger sequencing, which generates 
1-kilobase reads, was difficult, he says. In addi- 
tion, the lamprey genome has many GC-rich 
regions. The team used several types of software 
to assemble the complete DNA sequence. 

In July, scientists published a comparison of 
software programs used to assemble sequence 
reads'”. The researchers found that different 
assemblers give different results — even when 
fed the same sequence reads. Brown says that 


High-voltage 
power supply 


: Fluorescence excitation 
V 


Detection window 


DNA fragments 
are imaged as 
they pass by a 
charge-coupled 
device camera. 


biologists should never forget that assemblies 
are not certainties. Every new sequencing 
technology — from how the DNA sample is 
prepared to the sequencing chemistry — has 
the potential for error and bias. “If you have 
short reads, or bad biology, you're going to 
havea very hard time getting a good assembly, 
even in theory,” he says. 

Ideally, a genome assembly should deliver 
end-to-end chromosomal sequences, says 
Shendure. What worries him more than the 
discordance among assemblers in the compari- 
son study is that all of the assemblies were very 
fragmented. “That’s not a fault of the assem- 
blers, that’s a fault of the data that we're putting 
into the assemblers and the fact that we're not 
capturing contiguity at these longer scales,” he 
says. “The algorithms can only make do with 
the ingredients that they are provided by the 
technologies.” 

Brown is hopeful about the potential 
impact of longer-read technology. If Pacific 
Biosciences or Oxford Nanopore “deliver on 
many inexpensive long reads — more than 
10 kilobases, I'd say — regardless of accuracy, 
you would end up revolutionizing the genome- 
assembly field, because it would give you so 
much more information to work with’, he says. 
However, he adds, assembly software has to be 
compatible with each sequencing method. “So 
were continually playing catch up, where new 
sequencing technologies lead to new sequence- 
analysis approaches a year or three later.” 

Eichler agrees that sequencing and assembly 
must continue to improve. Read lengths longer 
than 200 kilobases and with 99.9% accuracy 
rates will be needed to unpick repeats and 
other complications, he says. He says that the 
Pacific Biosciences instrument and what he 


268 | NATURE | VOL 501 | 12 SEPTEMBER 2013 


© 2013 Macmillan Publishers Limited. All rights reserved 


knows of Moleculo “fall short of this, but are 
on the right track”. All read-length require- 
ments depend on the genome and complexity, 
he adds. For many bacterial genomes, current 
read lengths and accuracy are already suffi- 
cient, he says. 


THE NEXT TELESCOPE 

Oxford Nanopore plans to launch its new 
sequencing technology in the near future, 
but no date has been given. The technology 
expands on findings by researchers at Harvard 
University in Cambridge, Massachusetts, the 
University of Oxford, UK, and the University 
of California in Santa Cruz to harness the 
abilities of pore-forming proteins for DNA- 
sequencing devices. 

One of the weaknesses of current high- 
throughput sequencing technology is ampli- 
fication chemistry, says Oxford Nanopore’s 
Clive Brown. Although DNA is made up of 
four bases, it is possible that more than those 
canonical four — such as bases that are methyl- 
ated — should be detected, he says. 

And in some sections of genomes, bases are 
naturally missing. But current sequencers do 
not capture such variations — instead, says 
Brown, they produce the equivalent of a four- 
colour photocopy ofa picture with many more 
colours. “A lot of the detail is lost immediately, 
as soon as you make a four-colour copy,” he 
says. Ideally, “you take a chromosome and 
run it through the sequencer. You can't quite 
do that yet.” He, too, says that the next crucial 
phase of sequencing technology will be about 
long reads. 

Brown says that to his mind, sequencers are 
just opening the door to characterizing the 
genome. People can get “very cosy about what 
they can see’, with scientific instruments, he 
says. He likens today’s sequencers to the first 
telescopes, which offered a view of the Moon’s 
features and exploration of the visible spec- 
trum. “Tt gets you a long way, you can count the 
stars, see the planets,” he says. But the telescope 
does not show other celestial phenomena — 
such as dark matter or galactic movement. 

Like astronomers with their telescopes, 
genome researchers will get a clearer picture 
of the genome as the sequencing technologies 
improve, he says. And, inspired by that picture, 
they will strive to see even more. = 


Vivien Marx is technology editor for Nature 
and Nature Methods. 
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ADMINISTRATION 


A watchful eye on 
grant funding 


Researchers disillusioned with the lab and eager to engage 
their soft skills can find promise in scientific administration. 


ost scientists worry about how to 
Me grants. Buta select few make 

a career out of managing those 
grants to ensure that the money is well spent. 
This is the job of the scientific administrator. 

Broadly speaking, scientific administration 
involves the awarding, spending and tracking 
of funding at the grant, programme or policy 
level. Grant-level administrators assign or 
manage funds given to individual investiga- 
tors. Programme administrators look after the 
needs of multi-investigator or multi-institution 
projects. And policy administrators oversee 
funding for entire departments, institutions 
or even university systems. 

Jobs at all levels exist at universities, fed- 
eral agencies and foundations, and scientific 
administrators often flip between these worlds 
at different stages of their careers — perhaps 
awarding grants at institutions such as the US 
National Institutes of Health (NIH) or UK bio- 
medical charity the Wellcome Trust at one time, 
and managing funds at universities at another. 

Scientific administrators have a crucial 
role in the research process, says Ginny Cox 
Delaney, an organizational consultant in the 
Oakland Office of the President of the Univer- 
sity of California system, which administers 
ten research universities and five medical cen- 
tres. “For me, science administration means 
the value-added role of advancing an organi- 
zations research goal, besides doing the actual 
science,” she says. 


HUMBLE BEGINNINGS 

Almost no one sets out to be a scientific admin- 
istrator. Instead, they realize at some point that 
they would rather support the scientific pro- 
cess away from the bench than at it. 

When Elizabeth Prescott started her PhD, 
she had trouble visualizing where she would 
be in five years. After graduating, she delib- 
erately chose a postdoc in the lab of a young 
investigator, at Yale University in New Haven, 
Connecticut. “I got a first-hand view of what it 
looked like to be a junior faculty member,’ says 
Prescott, noting the long hours the job entailed. 
“T don't think I love this enough,” she thought. 

She was considering leaving her fellowship 
and searching for a job in industry when the 
director of the postdoctoral-affairs office at 
Yale told her about an upcoming position. 
The university was establishing numerous core 
facilities in fields including DNA sequencing 
and high-performance computing, andthe > 
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> provost’s office needed someone to act as a 
liaison. Apart from a requirement to monitor 
how effectively the facilities served the needs 
of the university’s scientists, the position was 
not well defined. So when Prescott got the job, 
she had a bit of a baptism of fire. “I learned 
about non-profit accounting; I learned about 
grant compliance,’ she says. She also learned 
to evaluate investigators’ needs, and then sort 
them into different priorities. 

In doing so, she became a link between the 
investigator, the granting agency and other 
parts of the university. These days, Prescott 
works as a foundation relations adviser at the 
Fred Hutchinson Cancer Research Center in 
Seattle, Washington, where she also acts as a 
liaison — this time between the centre and 
the companies and foundations that want to 
donate to it. Having scientific knowledge gives 
her credentials with all sides, she says. “It helps 
to have an understanding, to know what it is 
like to work in the lab. You don’t want to be 
perceived as a bean counter or a barrier.” 

‘Liaisor’ is one of many roles of scientific 
administrators. ‘Head of logistics’ is another. 
As a scientific review officer for the NIH in 
Bethesda, Maryland, Shiv Prasad recruits 
qualified scientists three times a year to review 
grant applications. To say that organizational 
skills are essential in his work would be an 
understatement, he notes. “When you have 
20 or more scientists coming to a study sec- 
tion from around the world, you want to make 
sure things run like clockwork. That means 
getting them their applications to review on 
time, giving them enough notice to make travel 
plans and making sure rooms are booked.” 

It takes a certain kind of person to excel in 
administration — someone who enjoys repeat- 
edly arranging and following up on meetings, 
and who understands how something as mun- 
dane as the wording on a form can affect the 
application experience, says Jonathan Best, 
grant operations manager at the Wellcome 
Trust in London. “I enjoy taking a process 
and making it work as efficiently as it can,” he 
says. “It takes me to a whole new level of geek- 
dom and challenges me in a totally different 
way to my previous scientific positions.” Best 
provides help at every step, from application, 
making sure that the applicant has filed the cor- 
rect information, to evaluation — seeing that 
members of the review committee get the appli- 
cations in time. Currently his office is updat- 
ing forms and revising deadlines to make the 
process less onerous for the applicant. 

Administering grants also involves moni- 
toring their success and implementation — 
activities that require attention to detail and a 
willingness to police recipients. After a grant 
has been awarded, Best helps to monitor how 
effective it was in advancing the investigator's 
field. He examines metrics such as publications 
and citations, as well as less tangible outcomes, 
including whether recipients have led effectively 
in their fields rather than following trends. 


An ability to deal with deadlines is essential, 
says Diane McFadden, associate director of 
the Northeast Biodefense Center at Columbia 
University in New York — an NIH centre of 
excellence that involves 28 investigators and 
3 postdocs at 12 institutions. With so many dif- 
ferent components to manage, deadlines “are 
constantly hitting you’; she says. And attention 
to detail is paramount: if someone does not fill 
out a form correctly and McFadden misses the 
error, funds or equipment could be delayed, 
slowing down projects. 

All these things can be learned on the job, 
as can an understanding of how the funding 
and follow-up processes work. What cannot 
be taught, says Prasad, is a passion for science. 
Keeping up with the latest advances, helping 
to shepherd promising ideas through funding 
and seeing the research turn into successful 
publications and applications are the most 
satisfying parts of the job, he says. 


ADVANCED ADMINISTRATION 

As administrators gain seniority, they tend to 
take on broader responsibilities. High-level 
scientific administrators need to be able to see 
the big picture, says Carl Rhodes, a senior sci- 
entific officer at the Howard Hughes Medical 
Institute (HHMI) in Chevy Chase, Maryland. 
He helps the institution to select and review 
new investigators, who are funded by the 
HHMI but based at universities. Until recently, 
he also helped to plan 
and run seven scien- 
tific meetings a year. 
“You need to decide 
what the agenda 
is and if it makes 
sense,” he says. “You 
need to decide how 
to advertise it, what 
is the most logical 
progression for the 


programme.” 

Oe Administrators 

Science . often learn to deal 
administration —_with increasing com- 
means the plexity as they move 
value-addedrole from monitoring 
ofadvancingan and giving grants to 
organization’s single investigators, 


research goal.” 
Ginny Cox Delaney 


to focusing on whole 
programmes, depart- 
ments or agencies. 
Rhodes started his teaching and administra- 
tion career at Stanford University in California, 
where he focused on “things other faculty didn't 
want to do’; he says, such as coordinating teach- 
ing assistants, refining the curriculum and deal- 
ing with the occasional disciplinary problem. 
He viewed himself as a service provider. “I made 
myself useful,’ he says. 

In that role, Rhodes developed key skills 
and insights that would serve him well ina 
later job as a grants officer at the NIH, where 
he managed a review panel that evaluated 
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joint PhD-medical doctorate programmes. 
It was also useful later on: sifting through 
1,000 graduate-school applications at a time 
at Stanford was at least as difficult as triaging 
hundreds of new-investigator applications 
at the HHML, he says. He now oversees 
scientific advisory panels that evaluate the 
strengths and weaknesses of each investiga- 
tor’s application and the contributions that 
they would make to their fields. The review 
officer combines the panel’s comments into a 
single document — and all the documents go 
through Rhodes for final approval. 

The ability to understand, create and eval- 
uate budgets is essential to scientific admin- 
istrators — especially as they move up the 
career ladder. McFadden has become adept at 
budgeting to make sure that her investigators’ 
needs are met. “I’ve learned a tremendous 
amount about building a budget and how to 
do that with fairness,” she says. Some projects 
deserve extra attention — and providing that, 
without belittling or alienating other projects 
in the consortium, can bea delicate process. 

Having concrete goals — while keeping in 
mind a broader picture — is another hallmark 
of success in upper policy-level administra- 
tion, says Sue Rosser, provost and vice-pres- 
ident for academic affairs at San Francisco 
State University in California. “The higher up 
you get in administration, the more emphasis 
there is on having a goal and getting things 
done and on time,’ she adds. Rosser works 
to make sure that the myriad meetings that 
populate her days are not empty exercises, 
but produce something tangible. “T like to 
have agendas, outcomes, follow-ups, results,” 
she says. For example, last year her institution 
decided not to renew funding for a multi- 
year, multimillion-dollar project that did not 
fit the university’s mission. More than a year 
before the programme was due to end, Rosser 
and her team began to have meetings about 
the closure, with detailed timelines to com- 
plete the programme’s activities and ensure 
a smooth transition to an interim grantee. 
Rosser, like Rhodes, has a service mindset 
— but she is not so closely connected to the 
people she helps. 

“I am trying to help the faculty under 
me succeed,” she says. “And I am not even 
directly doing that, I am helping deans help 
chairs help faculty:” 


AWAY IN 

Most people who move from a research career 
to scientific administration do so when they 
realize that lab work does not meet their life 
and career goals. There is no obvious, well- 
worn career path that reliably culminates in 
an administrative post, but there are ways to 
get a foothold — and to find out whether it 
is the right route. When Cox Delaney rec- 
ognized that she did not have “the golden 
hands” needed to succeed in the lab, she got a 
public-policy fellowship from the American 


Association for the Advancement of Science 
in Washington DC. That had her working 
on science-policy cooperation between the 
United States and western Europe, which 
helped her to develop her interpersonal 
communication skills. A subsequent job at 
the Alfred P. Sloan Foundation, a non-profit 
granting organization based in New York, 
taught her about scientific funding. 

Cox Delaney says that graduate stu- 
dents and postdocs who think they might 
be interested in administration can gather 
experience by running symposia, planning 
talks, dealing with 
caterers or book- 
ing speakers and 
venues for confer- 
ences. “There are 
a lot of opportuni- 
ties to step up into 
leadership roles,” 
she says. “That will 
give you a sense of 
whether you like 
the organizing 
piece.” 


“You need to’ Even pulling 
decide what 1s authors together 
the most log ical fora paper or coor- 
progression dinating a simple 
for the event such as a jour- 
programme.” nal club can help, if 
Carl Rhodes the organizers can 


observe the people 
they are working with and find out what 
motivates them, or can build new skills, says 
Cox Delaney. She is currently working with 
all the University of California campuses to 
find ways to share and save on administrative 
costs, so that they can pass the savings on to 
research and education. 

Advancing in administration often means 
nurturing a skill set that goes well beyond 
research. Cox Delaney puts people who 
go into scientific administration into two 
categories: people with a strong understand- 
ing of science and good interpersonal skills; 
and people adept at accounting and fund- 
ing. Administrators who have both sets of 
traits are relatively rare — and they are the 
ones who tend to rise to the highest levels, 
she says. 

Many universities — especially publicly 
funded, research-based ones — recruit their 
leading managers from a pool of scientists, 
who tend to be detail-oriented and adept at 
handling complexity, and researchers with a 
particular appetite and aptitude for manag- 
ing budgets and people will have multiple 
career options in administration. The path to 
broader responsibilities, or even a university 
presidency, could begin with shepherding a 
single grant. m 


Paul Smaglik is a freelance writer based in 
Milwaukee, Wisconsin. 
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RESEARCH 

Biology in space 

NASA is launching an open-ended 
research programme to investigate how 
human and other tissue reacts to time 
spent in space. The geneLAB project 

will begin seeking grant applications by 
autumn 2014, says D. Marshall Porterfield, 
director of space life and physical sciences 
research at NASA in Washington DC. 

It will award ‘innovation exploration 
grants of US$100,000 for one year; full 
grants will be for up to 5 years and worth a 
maximum of $500,000. The agency wants 
to send organisms including fruitflies and 
roundworms to the International Space 
Station to learn how spaceflight affects 
living tissue at the biomolecular and 
genetic level. Future grant recipients would 
also study bone loss and examine tissue 
from crew members to look for changes 
to their DNA that occurred while in space 
and after returning to Earth. 


GENDER 
Mothers’ careers stalled 


Attitudes about motherhood can 

impede women’s career aspirations, 

even at companies that purport to have 
family-friendly policies, a study suggests 
(C. Herman et al. Gender Work Organ. 
20, 467-478; 2013). Women working 

in science, engineering and technology 

at multinational corporations in the 
Netherlands, France and Italy adopted 
potentially career-damaging tactics 
including, for example, avoiding big 
projects and disguising the need to leave 
early or come in late because of childcare 
obligations, the study found. Firms must 
take stock of how attitudes stymie women 
who are looking to advance, says study 
co-author Anne Laure Humbert, a gender 
researcher at the European Institute for 
Gender Equality in Vilnius. 


FUNDING 


Genius grant grows 


Awards for the MacArthur Fellows 
Program, known as genius grants, will this 
year rise from US$500,000 to $625,000. 
The John D. and Catherine T. MacArthur 
Foundation in Chicago, Illinois, makes 
unrestricted 5-year grants to recipients 
chosen for their creativity, innovation and 
potential to shape the future. Spokesman 
Andrew Solomon says that the increase 

is partly a response to inflation and is the 
programme’ fourth rise since it began in 
1981. The 2013 fellows will be announced 
on 25 September. 
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hat Frank Green did wasn't 
difficult. All it took was a large 
amount of computing power and 


many weeks of meticulous research in 
musty book stacks. By mid-July he felt 
ready to test his findings. 

He wasn’t sure quite what hed 
found, but the pattern was unmis- 
takable. For most of recorded his- 
tory, people have been disappearing 
without trace, often right in front of 
astonished friends and acquaintances. 

The pair of professors tasked with 
reviewing Frank’s work proved scep- 
tical in the extreme, even after he 
showed them the simulation — the 
one that tracked the lacunae over time 
and space, the one that then predicted 
where and when these holes in space 
and time might occur. But two days 
later Frank was in Alberta, standing 
outside his tent where he’d bivouacked 
halfway up a deer trail at nearly 6,000 
feet, still stunned that they had said yes. 

“Trust us son,” Professor Wag- 
ner had said. “We've had stranger 
requests. Just bring back good hard 
data. That’s what the bean-counters 
like” 

And there lay the problem in a nut- 
shell. 

How do I collect data on the absence 
of something? 

It wouldn't be for want of trying. He was 
armed with cameras, EMF meters and as 
much computing power as he was able to 
carry around with him. Now all he had to 
do was wait, and trust in his theory. 

As he lay in his tent that night he pondered 
the wisdom of even being here. Yes, it might 
prove his theory. But there was also a chance, 
avery real chance, that he too would become 
one of the disappeared, just another name on 
a list of unsolved vanishings. It was a chance 
he was willing to take, for if he was right, he 
was on the verge of solving some of the great 
mysteries, from the missing Roman legion 
all the way to the Marie Celeste. Fortune and 
glory might wait just around the corner. 

As night fell he daydreamed; of book 
deals and chat shows, of front-page head- 

lines and TV appear- 
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LACUNAE 


The rhythm of life. 


tossing and turning in his sleeping bag, he 
got up and checked over some of the cases 
that had brought him to this place. He spent 
the next few hours studying the apocryphal 
tales surrounding the Flannan lighthouse in 


Scotland, and the supposed disappearance of 
three of the keepers in 1900 — another dot 
on his map. 

He’d worked for many long hours on the 
map, approaching it from many angles, 
looking for a pattern, a rhythm that might 
correlate the disappearances with some 
physical aspect that could be measured by 
his instruments. He'd cross-referenced his 
pattern with changes in the magnetic field, 
with sunspot activity, with lunar cycles, even 
with daily fluctuations in the stock market. 
He only hit the jackpot when he took into 
account the orbit around the Sun, and the 
long variations caused by the precession of 
the equinoxes. He found that there was a 
certain region of space that, whenever Earth 
passed through it, caused the now famous 
disappearances. The scale of the vanishing 
seemed to depend on how close the planet 
brushed against this spatial anomaly. And 
it was that anomaly that Frank hoped to 
measure over the coming day. 

He finally fell asleep with dreams of 
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fortune and glory, and woke at dawn toa 
sparkling clear morning, and the sound of 
his meters clicking. 

It was happening. It was really happening. 

The air tingled, as if suddenly super- 
charged with static, and the hairs 
Frank’s arms and neck all stood up 
at once. Reality seemed to slide and 
slip, as if the very fabric of nature was 
melting and becoming blurred. Frank 
struggled to focus on his equipment, 
but that too seemed to blur and fade, 
so much so that no meter readings 
were possible. 

Just bring back good hard data. 
That’ what the bean-counters like. 

Frank wailed. His dream of glory 
was fading fast, and without data all 
would be for nothing. He scrambled 
among the equipment, desperate to 
reach his laptop. He was so preoccu- 
pied that he failed to notice that the 
world outside his tent had gone. 

The laptop went away just as he put 
a hand on it. Blue static ran across his 
skin and crackled like damp sticks in 
a fire. 

Frank blinked. In that time, he too 
went away. 

He floated, in darkness filled with 
swirling fog containing occasional 
flashes of blue lightning. There was 
no sound, but there was a profound 
feeling of being at rest, of being cod- 
dled by the thick fog. 

There was no panic, and Frank's curiosity 
overcame any sense of fear. He examined the 
timing of the lightning, hoping once again 
to discern a pattern, a rhythm that he might 
identify and use to find a way out of his pre- 
dicament. 

The fog started to part and clear. A blue 
haze hung ahead of him, and Frank drifted 
towards it, carried by an unseen tide. He was 
eagerly awaiting what would come next. 

Right up until the moment when the blue 
haze parted and a vast mouth filled with 
banks of razor teeth opened up. 

He realized his main mistake even as the 
jaws clamped shut at his waist. 

There was one rhythm common through- 
out nature that he'd failed to take into account. 

Feeding time. m 
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